CNTK/Manual/Manual_How_to_debug.ipynb

1931 строка
160 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Debug CNTK programs\n",
"\n",
"> \"Help! I just got this recipe from the web, I don't understand what it does, why it fails, and how to modify it for my purposes\". --- Anonymous\n",
"\n",
"The purpose of this tutorial is to help you understand some of the facilities CNTK provides to make the development of deep learning models easier. Some of the advice here are considered good programming practices in general, but we will still cover them in the context of building models."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from __future__ import print_function\n",
"import cntk as C\n",
"import numpy as np\n",
"import scipy.sparse as sparse\n",
"import sys\n",
"import cntk.tests.test_utils\n",
"cntk.tests.test_utils.set_device_from_pytest_env() # (only needed for our build system)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Why isn't CNTK using my GPU?\n",
"First check the following.\n",
"- You have an NVidia GPU\n",
"- It is listed when running nvidia-smi\n",
"\n",
"Then make sure CNTK sees your GPU: `all_devices()` returns all the available devices. If your GPU is not listed here, your installation is somehow broken. If CNTK lists a GPU, make sure no other CNTK process is using it (check nvidia-smi, under ``C:\\Program Files\\NVIDIA Corporation\\NVSMI\\nvidia-smi.exe`` on Windows and ``/usr/bin/nvidia-smi`` on Linux). If you have a zombie process using it you can try this \n",
"\n",
"- on Linux\n",
" ```bash\n",
" $ fuser -k /var/lock/CNTK_exclusive_lock_for_GPU_0\n",
" ```\n",
" will kill the process that created `/var/lock/CNTK_exclusive_lock_for_GPU_0`\n",
"- on Windows\n",
" * Make sure you have [Process Explorer](https://technet.microsoft.com/en-us/sysinternals/processexplorer.aspx)\n",
" * Open Process Explorer and under View -> Select Columns... click on the GPU tab and check all the checkboxes\n",
" * Now you should be able to sort all processes based on things like \"GPU System Bytes\" or other attributes. You can kill Python processes that are hogging your GPU(s) and this will automatically release the lock on this device.\n",
"\n",
"Even if some other process is using the GPU you can still use it as well with `try_set_default_device(C.gpu(0))`; the locks are only meant for automatic device selection to not accidentally allocate one GPU to two processes that are going to it heavily. If you know that's not the case, it's better to specify the GPU explicitly with `try_set_default_device` "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(GPU[0] GeForce GTX TITAN X, CPU)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"C.all_devices()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True\n"
]
}
],
"source": [
"success=C.try_set_default_device(C.gpu(0))\n",
"print(success)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GPU[0] GeForce GTX TITAN X\n"
]
}
],
"source": [
"dev=C.use_default_device()\n",
"print(dev)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Does this network do what I think it does?\n",
"\n",
"First, if you are coding something from scratch, start small and try to verify every step. Don't write a full network and hope everything will work when you use it. CNTK is doing some type checking as you construct the graph but this can be limited especially when you use placeholders (it's hard to prove that no input shape can match the requirements of the network). In particular the cntk layers library makes extensive use of placeholders so error messages at the point of first use are quite common.\n",
"\n",
"There are multiple levels of verification you can engage into. The simplest one is to just print the functions you are building\n",
"Consider the following (broken) code "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Composite(Dense): Placeholder('x', [???], [???]) -> Output('Block2019_Output_0', [???], [???])\n"
]
}
],
"source": [
"def create_gru_stack(input_layer):\n",
" e = C.layers.Embedding(300)(input_layer)\n",
" return C.layers.Fold(C.layers.GRU(64))(e)\n",
"\n",
"def create_model(question_input, answer_input):\n",
" with C.default_options(init=C.glorot_uniform()):\n",
" question_stack = create_gru_stack(question_input)\n",
" answer_stack = create_gru_stack(answer_input)\n",
" combined = C.splice(question_stack, answer_stack)\n",
" combined = C.layers.Dropout(0.5)(combined)\n",
" combined = C.layers.LayerNormalization()(combined)\n",
" combined = C.layers.Dense(64, activation=C.sigmoid)(combined)\n",
" combined = C.layers.LayerNormalization()\n",
" combined = C.layers.Dense(1, activation=C.softmax)(combined)\n",
" return combined\n",
"\n",
"question_input = C.sequence.input_variable(shape=10, is_sparse=True, name='q_input')\n",
"answer_input = C.sequence.input_variable(shape=10, is_sparse=True, name='a_input')\n",
"\n",
"model = create_model(question_input, answer_input)\n",
"print(repr(model))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Digging deeper\n",
"This doesn't look right. We have clearly given the function two sequences of vectors of dimensionality 10 each yet the model has been created with a single Placeholder input of unknown dynamic axes, as indicated by the first `[???]`, and unknown shape, indicated by the second `[???]`. Because of that, the Output is also of unknown shape and dynamic axes. \n",
"\n",
"How do we find and eliminate the cause of this issue? One possibility is to do a sort of binary search. Clearly the model starts with well defined inputs, but ends up ignoring them. At which point did this happen? We can try \"prefixes\" of the above model (i.e. including only the first few layers) in a binary search fashion. We pretty soon find these"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Composite(Dense): Input('q_input', [#, *], [10]), Input('a_input', [#, *], [10]) -> Output('Block3826_Output_0', [#], [64])\n",
"Composite(LayerNormalization): Placeholder('x', [???], [???]) -> Output('Block5955_Output_0', [???], [???])\n"
]
}
],
"source": [
"def create_model_working(question_input, answer_input):\n",
" with C.default_options(init=C.glorot_uniform()):\n",
" question_stack = create_gru_stack(question_input)\n",
" answer_stack = create_gru_stack(answer_input)\n",
" combined = C.splice(question_stack, answer_stack)\n",
" combined = C.layers.Dropout(0.5)(combined)\n",
" combined = C.layers.LayerNormalization()(combined)\n",
" combined = C.layers.Dense(64, activation=C.sigmoid)(combined)\n",
" return combined\n",
"\n",
"def create_model_broken(question_input, answer_input):\n",
" with C.default_options(init=C.glorot_uniform()):\n",
" question_stack = create_gru_stack(question_input)\n",
" answer_stack = create_gru_stack(answer_input)\n",
" combined = C.splice(question_stack, answer_stack)\n",
" combined = C.layers.Dropout(0.5)(combined)\n",
" combined = C.layers.LayerNormalization()(combined)\n",
" combined = C.layers.Dense(64, activation=C.sigmoid)(combined)\n",
" combined = C.layers.LayerNormalization()\n",
" return combined\n",
"\n",
"model_working = create_model_working(question_input, answer_input)\n",
"print(repr(model_working))\n",
"\n",
"model_broken = create_model_broken(question_input, answer_input)\n",
"print(repr(model_broken))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Aha!\n",
"The problem is of course that we did not call \n",
"```python\n",
"combined = C.layers.LayerNormalization()(combined)\n",
"```\n",
"but \n",
"```python\n",
"combined = C.layers.LayerNormalization()\n",
"``` \n",
"which creates a layer normalization layer with a placeholder as an input.\n",
"\n",
"This mistake is easy to make because it is tedious to write `result = layer(layer_attributes)(result)` all the time. The layers library that comes with CNTK can eliminate these kinds of bugs."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Composite(Dense): Input('q_input', [#, *], [10]), Input('a_input', [#, *], [10]) -> Output('Block7931_Output_0', [#], [1])\n"
]
}
],
"source": [
"def create_model_layers(question_input, answer_input):\n",
" with C.default_options(init=C.glorot_uniform()):\n",
" question_stack = create_gru_stack(question_input)\n",
" answer_stack = create_gru_stack(answer_input)\n",
" combined = C.splice(question_stack, answer_stack)\n",
" return C.layers.Sequential([C.layers.Dropout(0.5),\n",
" C.layers.LayerNormalization(),\n",
" C.layers.Dense(64, activation=C.sigmoid),\n",
" C.layers.LayerNormalization(),\n",
" C.layers.Dense(1, activation=C.softmax)])(combined)\n",
"\n",
"model_layers = create_model_layers(question_input, answer_input)\n",
"print(repr(model_layers))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Guideline 1\n",
"\n",
"> Use the layers library as much as possible\n",
"\n",
"This sort of advice can be found in every programming language. The library that comes with CNTK is more tested than your code, and subsequent improvements in the library can automatically benefit your program.\n",
"\n",
"### Runtime errors\n",
"\n",
"The network above has more problems. In particular when we feed data to it will complain. The reason for it complaining has to do with the meaning of `[#, *]` that gets printed as part of the signature of `model_layers` above. CNTK uses `#` to mean the batch axis (the mnemonic is `#` that the [number sign](https://en.wikipedia.org/wiki/Number_sign) designates the number of samples in the minibatch). Traditionally, CNTK has been using `*` to mean the default sequence axis. When two variables have the same axes, this means they must have exactly the same shape. So when we see that both inputs in the above example have dynamic axes `[#, *]` it means that they must have the same length. This is clearly not reasonable in this example where the length of the question and the length of the answer don't need to be the same. To fix this we need to explicitly say that `question` and `answer` can have different lengths. "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Composite(Dense): Input('q_input', [#, q], [10]), Input('a_input', [#, a], [10]) -> Output('Block10087_Output_0', [#], [1])\n"
]
}
],
"source": [
"q_axis = C.Axis.new_unique_dynamic_axis('q')\n",
"a_axis = C.Axis.new_unique_dynamic_axis('a')\n",
"q_input = C.sequence.input_variable(shape=10, is_sparse=True, sequence_axis=q_axis, name='q_input')\n",
"a_input = C.sequence.input_variable(shape=10, is_sparse=True, sequence_axis=a_axis, name='a_input')\n",
"\n",
"model_layers_distinct_axes = create_model_layers(q_input, a_input)\n",
"print(repr(model_layers_distinct_axes))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Guideline 2\n",
"\n",
"> Understand CNTK's types and assumptions.\n",
"\n",
"The Python API documentation tries to include examples of usage for every basic operation so it is easy to see for each operation what is expected and what gets produced.\n",
"\n",
"### Guideline 3\n",
"\n",
"> When debugging, print each function to verify the types of its inputs and outputs.\n",
"\n",
"We were able to catch two bugs so far by simply inspecting the output of print. For big models that you did not write yourself you might have to do this on each layer or in a binary search fashion as we did for finding the first bug.\n",
"\n",
"### Model bugs\n",
"\n",
"We are not done with the network above. So far we have only used printing of types to guide us. But this is not always enough to debug all issues. We can get more information from a function by plotting the underlying graph. That can be done with `logging.graph.plot` and it requires to have [graphviz](http://www.graphviz.org) installed, and have the binaries in your PATH environment variable. Inside a notebook we can display the network inline (use the scrollbar on the bottom and/or the right to see the whole network). Notice that none of the parameters are shared between the question and the answer. A typical solution might want to share the embedding, or both the embedding and the GRU if data is limited.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<svg height=\"1340pt\" viewBox=\"0.00 0.00 989.03 1340.00\" width=\"989pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g class=\"graph\" id=\"graph0\" transform=\"scale(1 1) rotate(0) translate(4 1336)\">\n",
"<title>network_graph</title>\n",
"<polygon fill=\"white\" points=\"-4,4 -4,-1336 985.035,-1336 985.035,4 -4,4\" stroke=\"none\"/>\n",
"<!-- Block10087 -->\n",
"<g class=\"node\" id=\"node1\"><title>Block10087</title>\n",
"<polygon fill=\"lightgray\" points=\"251,-191 157,-191 157,-119 251,-119 251,-191\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"204\" y=\"-151.9\">Dense</text>\n",
"</g>\n",
"<!-- Block10087_Output_0 -->\n",
"<g class=\"node\" id=\"node5\"><title>Block10087_Output_0</title>\n",
"<polygon fill=\"lightgray\" points=\"208.633,-0.0986735 211.701,-0.29575 214.737,-0.590689 217.729,-0.982683 220.664,-1.47066 223.532,-2.05327 226.32,-2.72894 229.018,-3.49579 231.615,-4.35174 234.103,-5.29443 236.471,-6.32129 238.713,-7.42949 240.819,-8.616 242.784,-9.87757 244.602,-11.2107 246.268,-12.6119 247.777,-14.0771 249.126,-15.6024 250.314,-17.1836 251.338,-18.8164 252.198,-20.4963 252.895,-22.2187 253.428,-23.9788 253.799,-25.7719 254.012,-27.5931 254.07,-29.4373 253.975,-31.2994 253.733,-33.1745 253.348,-35.0573 252.826,-36.9427 252.172,-38.8255 251.392,-40.7006 250.493,-42.5627 249.481,-44.4069 248.364,-46.2281 247.147,-48.0212 245.837,-49.7813 244.442,-51.5037 242.968,-53.1836 241.421,-54.8164 239.809,-56.3976 238.136,-57.9229 236.41,-59.3881 234.635,-60.7893 232.817,-62.1224 230.961,-63.384 229.072,-64.5705 227.154,-65.6787 225.21,-66.7056 223.245,-67.6483 221.261,-68.5042 219.262,-69.2711 217.251,-69.9467 215.228,-70.5293 213.197,-71.0173 211.16,-71.4093 209.118,-71.7042 207.072,-71.9013 205.024,-72 202.976,-72 200.928,-71.9013 198.882,-71.7042 196.84,-71.4093 194.803,-71.0173 192.772,-70.5293 190.749,-69.9467 188.738,-69.2711 186.739,-68.5042 184.755,-67.6483 182.79,-66.7056 180.846,-65.6787 178.928,-64.5705 177.039,-63.384 175.183,-62.1224 173.365,-60.7893 171.59,-59.3881 169.864,-57.9229 168.191,-56.3976 166.579,-54.8164 165.032,-53.1836 163.558,-51.5037 162.163,-49.7813 160.853,-48.0212 159.636,-46.2281 158.519,-44.4069 157.507,-42.5627 156.608,-40.7006 155.828,-38.8255 155.174,-36.9427 154.652,-35.0573 154.267,-33.1745 154.025,-31.2994 153.93,-29.4373 153.988,-27.5931 154.201,-25.7719 154.572,-23.9788 155.105,-22.2187 155.802,-20.4963 156.662,-18.8164 157.686,-17.1836 158.874,-15.6024 160.223,-14.0771 161.732,-12.6119 163.398,-11.2107 165.216,-9.87757 167.181,-8.616 169.287,-7.42949 171.529,-6.32129 173.897,-5.29443 176.385,-4.35174 178.982,-3.49579 181.68,-2.72894 184.468,-2.05327 187.336,-1.47066 190.271,-0.982683 193.263,-0.590689 196.299,-0.29575 199.367,-0.0986735 202.453,-3.55271e-013 205.547,-3.69482e-013 208.633,-0.0986735\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"204\" y=\"-25.9\">[#](1,)</text>\n",
"</g>\n",
"<!-- Block10087&#45;&gt;Block10087_Output_0 -->\n",
"<g class=\"edge\" id=\"edge4\"><title>Block10087-&gt;Block10087_Output_0</title>\n",
"<path d=\"M204,-118.896C204,-107.327 204,-94.2965 204,-82.0893\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"207.5,-82.079 204,-72.0791 200.5,-82.0791 207.5,-82.079\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"216.5\" y=\"-93\">[#](1,)</text>\n",
"</g>\n",
"<!-- Parameter9598 -->\n",
"<g class=\"node\" id=\"node2\"><title>Parameter9598</title>\n",
"<polygon fill=\"lightgray\" points=\"150,-306.5 78,-306.5 78,-263.5 150,-263.5 150,-306.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"114\" y=\"-288.4\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"114\" y=\"-275.4\">(64, 1)</text>\n",
"</g>\n",
"<!-- Parameter9598&#45;&gt;Block10087 -->\n",
"<g class=\"edge\" id=\"edge1\"><title>Parameter9598-&gt;Block10087</title>\n",
"<path d=\"M128.46,-263.435C140.567,-246.216 158.259,-221.054 173.477,-199.41\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"176.5,-201.196 179.389,-191.003 170.774,-197.17 176.5,-201.196\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"176.5\" y=\"-223\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"176.5\" y=\"-212\">(64, 1)</text>\n",
"</g>\n",
"<!-- Parameter9599 -->\n",
"<g class=\"node\" id=\"node3\"><title>Parameter9599</title>\n",
"<polygon fill=\"lightgray\" points=\"240,-306.5 168,-306.5 168,-263.5 240,-263.5 240,-306.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"204\" y=\"-288.4\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"204\" y=\"-275.4\">(1,)</text>\n",
"</g>\n",
"<!-- Parameter9599&#45;&gt;Block10087 -->\n",
"<g class=\"edge\" id=\"edge2\"><title>Parameter9599-&gt;Block10087</title>\n",
"<path d=\"M204,-263.435C204,-246.673 204,-222.385 204,-201.141\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"207.5,-201.003 204,-191.003 200.5,-201.003 207.5,-201.003\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"210.5\" y=\"-223\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"210.5\" y=\"-212\">(1,)</text>\n",
"</g>\n",
"<!-- Block10062 -->\n",
"<g class=\"node\" id=\"node4\"><title>Block10062</title>\n",
"<polygon fill=\"lightgray\" points=\"352,-321 258,-321 258,-249 352,-249 352,-321\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"305\" y=\"-281.9\">LayerNormalization</text>\n",
"</g>\n",
"<!-- Block10062&#45;&gt;Block10087 -->\n",
"<g class=\"edge\" id=\"edge3\"><title>Block10062-&gt;Block10087</title>\n",
"<path d=\"M277.134,-248.685C265.101,-233.435 250.907,-215.446 238.24,-199.393\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"240.717,-196.882 231.775,-191.2 235.222,-201.219 240.717,-196.882\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"304.5\" y=\"-223\">Block10062_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"304.5\" y=\"-212\">[#](64,)</text>\n",
"</g>\n",
"<!-- Parameter9549 -->\n",
"<g class=\"node\" id=\"node6\"><title>Parameter9549</title>\n",
"<polygon fill=\"lightgray\" points=\"251,-436.5 179,-436.5 179,-393.5 251,-393.5 251,-436.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"215\" y=\"-418.4\">scale</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"215\" y=\"-405.4\">()</text>\n",
"</g>\n",
"<!-- Parameter9549&#45;&gt;Block10062 -->\n",
"<g class=\"edge\" id=\"edge5\"><title>Parameter9549-&gt;Block10062</title>\n",
"<path d=\"M229.46,-393.435C241.567,-376.216 259.259,-351.054 274.477,-329.41\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"277.5,-331.196 280.389,-321.003 271.774,-327.17 277.5,-331.196\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"275.5\" y=\"-353\">scale</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"275.5\" y=\"-342\">()</text>\n",
"</g>\n",
"<!-- Parameter9550 -->\n",
"<g class=\"node\" id=\"node7\"><title>Parameter9550</title>\n",
"<polygon fill=\"lightgray\" points=\"341,-436.5 269,-436.5 269,-393.5 341,-393.5 341,-436.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"305\" y=\"-418.4\">bias</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"305\" y=\"-405.4\">()</text>\n",
"</g>\n",
"<!-- Parameter9550&#45;&gt;Block10062 -->\n",
"<g class=\"edge\" id=\"edge6\"><title>Parameter9550-&gt;Block10062</title>\n",
"<path d=\"M305,-393.435C305,-376.673 305,-352.385 305,-331.141\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"308.5,-331.003 305,-321.003 301.5,-331.003 308.5,-331.003\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"313.5\" y=\"-353\">bias</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"313.5\" y=\"-342\">()</text>\n",
"</g>\n",
"<!-- Block10022 -->\n",
"<g class=\"node\" id=\"node8\"><title>Block10022</title>\n",
"<polygon fill=\"lightgray\" points=\"453,-451 359,-451 359,-379 453,-379 453,-451\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"406\" y=\"-411.9\">Dense</text>\n",
"</g>\n",
"<!-- Block10022&#45;&gt;Block10062 -->\n",
"<g class=\"edge\" id=\"edge7\"><title>Block10022-&gt;Block10062</title>\n",
"<path d=\"M378.134,-378.685C366.101,-363.435 351.907,-345.446 339.24,-329.393\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"341.717,-326.882 332.775,-321.2 336.222,-331.219 341.717,-326.882\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"405.5\" y=\"-353\">Block10022_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"405.5\" y=\"-342\">[#](64,)</text>\n",
"</g>\n",
"<!-- Parameter9529 -->\n",
"<g class=\"node\" id=\"node9\"><title>Parameter9529</title>\n",
"<polygon fill=\"lightgray\" points=\"352,-566.5 280,-566.5 280,-523.5 352,-523.5 352,-566.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"316\" y=\"-548.4\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"316\" y=\"-535.4\">(128, 64)</text>\n",
"</g>\n",
"<!-- Parameter9529&#45;&gt;Block10022 -->\n",
"<g class=\"edge\" id=\"edge8\"><title>Parameter9529-&gt;Block10022</title>\n",
"<path d=\"M330.259,-523.247C340.659,-508.156 355.126,-487.258 368,-469 370.248,-465.812 372.579,-462.522 374.93,-459.216\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"377.825,-461.185 380.78,-451.01 372.125,-457.121 377.825,-461.185\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"385\" y=\"-483\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"385\" y=\"-472\">(128, 64)</text>\n",
"</g>\n",
"<!-- Parameter9530 -->\n",
"<g class=\"node\" id=\"node10\"><title>Parameter9530</title>\n",
"<polygon fill=\"lightgray\" points=\"442,-566.5 370,-566.5 370,-523.5 442,-523.5 442,-566.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"406\" y=\"-548.4\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"406\" y=\"-535.4\">(64,)</text>\n",
"</g>\n",
"<!-- Parameter9530&#45;&gt;Block10022 -->\n",
"<g class=\"edge\" id=\"edge9\"><title>Parameter9530-&gt;Block10022</title>\n",
"<path d=\"M406,-523.435C406,-506.673 406,-482.385 406,-461.141\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"409.5,-461.003 406,-451.003 402.5,-461.003 409.5,-461.003\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"415\" y=\"-483\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"415\" y=\"-472\">(64,)</text>\n",
"</g>\n",
"<!-- Block9997 -->\n",
"<g class=\"node\" id=\"node11\"><title>Block9997</title>\n",
"<polygon fill=\"lightgray\" points=\"554,-581 460,-581 460,-509 554,-509 554,-581\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"507\" y=\"-541.9\">LayerNormalization</text>\n",
"</g>\n",
"<!-- Block9997&#45;&gt;Block10022 -->\n",
"<g class=\"edge\" id=\"edge10\"><title>Block9997-&gt;Block10022</title>\n",
"<path d=\"M479.134,-508.685C467.101,-493.435 452.907,-475.446 440.24,-459.393\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"442.717,-456.882 433.775,-451.2 437.222,-461.219 442.717,-456.882\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"504.5\" y=\"-483\">Block9997_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"504.5\" y=\"-472\">[#](128,)</text>\n",
"</g>\n",
"<!-- Parameter9480 -->\n",
"<g class=\"node\" id=\"node12\"><title>Parameter9480</title>\n",
"<polygon fill=\"lightgray\" points=\"453,-696.5 381,-696.5 381,-653.5 453,-653.5 453,-696.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"417\" y=\"-678.4\">scale</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"417\" y=\"-665.4\">()</text>\n",
"</g>\n",
"<!-- Parameter9480&#45;&gt;Block9997 -->\n",
"<g class=\"edge\" id=\"edge11\"><title>Parameter9480-&gt;Block9997</title>\n",
"<path d=\"M431.46,-653.435C443.567,-636.216 461.259,-611.054 476.477,-589.41\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"479.5,-591.196 482.389,-581.003 473.774,-587.17 479.5,-591.196\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"477.5\" y=\"-613\">scale</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"477.5\" y=\"-602\">()</text>\n",
"</g>\n",
"<!-- Parameter9481 -->\n",
"<g class=\"node\" id=\"node13\"><title>Parameter9481</title>\n",
"<polygon fill=\"lightgray\" points=\"543,-696.5 471,-696.5 471,-653.5 543,-653.5 543,-696.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"507\" y=\"-678.4\">bias</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"507\" y=\"-665.4\">()</text>\n",
"</g>\n",
"<!-- Parameter9481&#45;&gt;Block9997 -->\n",
"<g class=\"edge\" id=\"edge12\"><title>Parameter9481-&gt;Block9997</title>\n",
"<path d=\"M507,-653.435C507,-636.673 507,-612.385 507,-591.141\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"510.5,-591.003 507,-581.003 503.5,-591.003 510.5,-591.003\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"515.5\" y=\"-613\">bias</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"515.5\" y=\"-602\">()</text>\n",
"</g>\n",
"<!-- Block9959 -->\n",
"<g class=\"node\" id=\"node14\"><title>Block9959</title>\n",
"<polygon fill=\"lightgray\" points=\"655,-711 561,-711 561,-639 655,-639 655,-711\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"608\" y=\"-671.9\">Dropout</text>\n",
"</g>\n",
"<!-- Block9959&#45;&gt;Block9997 -->\n",
"<g class=\"edge\" id=\"edge13\"><title>Block9959-&gt;Block9997</title>\n",
"<path d=\"M580.134,-638.685C568.101,-623.435 553.907,-605.446 541.24,-589.393\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"543.717,-586.882 534.775,-581.2 538.222,-591.219 543.717,-586.882\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"605.5\" y=\"-613\">Block9959_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"605.5\" y=\"-602\">[#](128,)</text>\n",
"</g>\n",
"<!-- Splice9467 -->\n",
"<g class=\"node\" id=\"node15\"><title>Splice9467</title>\n",
"<polygon fill=\"lightgray\" points=\"630,-812 586,-812 586,-769 630,-769 630,-812\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"608\" y=\"-787.4\">Splice</text>\n",
"</g>\n",
"<!-- Splice9467&#45;&gt;Block9959 -->\n",
"<g class=\"edge\" id=\"edge14\"><title>Splice9467-&gt;Block9959</title>\n",
"<path d=\"M608,-768.946C608,-755.615 608,-737.71 608,-721.174\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"611.5,-721.108 608,-711.108 604.5,-721.108 611.5,-721.108\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"651\" y=\"-743\">Splice9467_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"651\" y=\"-732\">[#](128,)</text>\n",
"</g>\n",
"<!-- Block8726 -->\n",
"<g class=\"node\" id=\"node16\"><title>Block8726</title>\n",
"<polygon fill=\"lightgray\" points=\"449,-942 355,-942 355,-870 449,-870 449,-942\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"402\" y=\"-902.9\">Sequence::Slice</text>\n",
"</g>\n",
"<!-- Block8726&#45;&gt;Splice9467 -->\n",
"<g class=\"edge\" id=\"edge15\"><title>Block8726-&gt;Splice9467</title>\n",
"<path d=\"M449.002,-879.103C488.186,-857.514 542.905,-827.366 576.859,-808.658\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"578.785,-811.593 585.855,-803.701 575.407,-805.462 578.785,-811.593\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"577.5\" y=\"-844\">Block8726_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"577.5\" y=\"-833\">[#](64,)</text>\n",
"</g>\n",
"<!-- Block9397 -->\n",
"<g class=\"node\" id=\"node17\"><title>Block9397</title>\n",
"<polygon fill=\"lightgray\" points=\"771,-942 677,-942 677,-870 771,-870 771,-942\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"724\" y=\"-902.9\">Sequence::Slice</text>\n",
"</g>\n",
"<!-- Block9397&#45;&gt;Splice9467 -->\n",
"<g class=\"edge\" id=\"edge16\"><title>Block9397-&gt;Splice9467</title>\n",
"<path d=\"M687.933,-869.71C671.358,-853.492 651.98,-834.532 636.455,-819.342\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"638.768,-816.709 629.173,-812.217 633.873,-821.712 638.768,-816.709\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"711.5\" y=\"-844\">Block9397_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"711.5\" y=\"-833\">[#](64,)</text>\n",
"</g>\n",
"<!-- Block8682 -->\n",
"<g class=\"node\" id=\"node18\"><title>Block8682</title>\n",
"<polygon fill=\"lightgray\" points=\"289,-1072 195,-1072 195,-1000 289,-1000 289,-1072\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"242\" y=\"-1032.9\">GRU</text>\n",
"</g>\n",
"<!-- Block8682&#45;&gt;Block8726 -->\n",
"<g class=\"edge\" id=\"edge17\"><title>Block8682-&gt;Block8726</title>\n",
"<path d=\"M289.067,-1013.35C305.374,-1004.76 323.249,-994.036 338,-982 349.312,-972.769 360.191,-961.364 369.67,-950.257\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"372.616,-952.188 376.311,-942.258 367.23,-947.717 372.616,-952.188\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"401.5\" y=\"-974\">Block8682_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"401.5\" y=\"-963\">[#,*](64,)</text>\n",
"</g>\n",
"<!-- PastValue8612 -->\n",
"<g class=\"node\" id=\"node23\"><title>PastValue8612</title>\n",
"<polygon fill=\"lightgray\" points=\"167,-927.5 105,-927.5 105,-884.5 167,-884.5 167,-927.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"136\" y=\"-902.9\">PastValue</text>\n",
"</g>\n",
"<!-- Block8682&#45;&gt;PastValue8612 -->\n",
"<g class=\"edge\" id=\"edge24\"><title>Block8682-&gt;PastValue8612</title>\n",
"<path d=\"M194.863,-1024.53C172.01,-1016.82 146.617,-1003.76 133,-982 124.879,-969.022 124.844,-952.16 127.183,-937.712\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"130.654,-938.206 129.225,-927.708 123.796,-936.807 130.654,-938.206\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"175.5\" y=\"-974\">Block8682_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"175.5\" y=\"-963\">[#,*](64,)</text>\n",
"</g>\n",
"<!-- Parameter8152 -->\n",
"<g class=\"node\" id=\"node19\"><title>Parameter8152</title>\n",
"<polygon fill=\"lightgray\" points=\"72,-1187.5 0,-1187.5 0,-1144.5 72,-1144.5 72,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"36\" y=\"-1169.4\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"36\" y=\"-1156.4\">(192,)</text>\n",
"</g>\n",
"<!-- Parameter8152&#45;&gt;Block8682 -->\n",
"<g class=\"edge\" id=\"edge18\"><title>Parameter8152-&gt;Block8682</title>\n",
"<path d=\"M61.9552,-1144.35C68.1378,-1139.56 74.7549,-1134.54 81,-1130 106.628,-1111.38 112.787,-1106.22 140,-1090 154.551,-1081.33 170.666,-1072.58 185.709,-1064.76\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"187.625,-1067.71 194.913,-1060.02 184.42,-1061.49 187.625,-1067.71\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"151\" y=\"-1104\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"151\" y=\"-1093\">(192,)</text>\n",
"</g>\n",
"<!-- Parameter8153 -->\n",
"<g class=\"node\" id=\"node20\"><title>Parameter8153</title>\n",
"<polygon fill=\"lightgray\" points=\"162,-1187.5 90,-1187.5 90,-1144.5 162,-1144.5 162,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"126\" y=\"-1169.4\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"126\" y=\"-1156.4\">(300, 192)</text>\n",
"</g>\n",
"<!-- Parameter8153&#45;&gt;Block8682 -->\n",
"<g class=\"edge\" id=\"edge19\"><title>Parameter8153-&gt;Block8682</title>\n",
"<path d=\"M140.959,-1144.25C152.424,-1128.79 168.953,-1107.43 185,-1090 188.261,-1086.46 191.718,-1082.88 195.253,-1079.33\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"197.896,-1081.64 202.583,-1072.14 192.994,-1076.65 197.896,-1081.64\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"204\" y=\"-1104\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"204\" y=\"-1093\">(300, 192)</text>\n",
"</g>\n",
"<!-- Parameter8154 -->\n",
"<g class=\"node\" id=\"node21\"><title>Parameter8154</title>\n",
"<polygon fill=\"lightgray\" points=\"252,-1187.5 180,-1187.5 180,-1144.5 252,-1144.5 252,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"216\" y=\"-1169.4\">H</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"216\" y=\"-1156.4\">(64, 128)</text>\n",
"</g>\n",
"<!-- Parameter8154&#45;&gt;Block8682 -->\n",
"<g class=\"edge\" id=\"edge20\"><title>Parameter8154-&gt;Block8682</title>\n",
"<path d=\"M220.177,-1144.43C223.598,-1127.6 228.56,-1103.16 232.89,-1081.85\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"236.329,-1082.5 234.89,-1072 229.469,-1081.11 236.329,-1082.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"247\" y=\"-1104\">H</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"247\" y=\"-1093\">(64, 128)</text>\n",
"</g>\n",
"<!-- Parameter8155 -->\n",
"<g class=\"node\" id=\"node22\"><title>Parameter8155</title>\n",
"<polygon fill=\"lightgray\" points=\"342,-1187.5 270,-1187.5 270,-1144.5 342,-1144.5 342,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"306\" y=\"-1169.4\">H1</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"306\" y=\"-1156.4\">(64, 64)</text>\n",
"</g>\n",
"<!-- Parameter8155&#45;&gt;Block8682 -->\n",
"<g class=\"edge\" id=\"edge21\"><title>Parameter8155-&gt;Block8682</title>\n",
"<path d=\"M295.717,-1144.43C287.184,-1127.37 274.75,-1102.5 263.993,-1080.99\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"267.104,-1079.38 259.502,-1072 260.843,-1082.51 267.104,-1079.38\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"291.5\" y=\"-1104\">H1</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"291.5\" y=\"-1093\">(64, 64)</text>\n",
"</g>\n",
"<!-- PastValue8612&#45;&gt;Block8682 -->\n",
"<g class=\"edge\" id=\"edge22\"><title>PastValue8612-&gt;Block8682</title>\n",
"<path d=\"M167.26,-919.184C185.75,-927.914 208.183,-941.425 222,-960 228.461,-968.686 232.836,-979.305 235.799,-989.789\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"232.454,-990.849 238.228,-999.732 239.254,-989.187 232.454,-990.849\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"283\" y=\"-974\">PastValue8612_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"283\" y=\"-963\">[#,*](64,)</text>\n",
"</g>\n",
"<!-- Block8143 -->\n",
"<g class=\"node\" id=\"node24\"><title>Block8143</title>\n",
"<polygon fill=\"lightgray\" points=\"454,-1202 360,-1202 360,-1130 454,-1130 454,-1202\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"407\" y=\"-1162.9\">Embedding</text>\n",
"</g>\n",
"<!-- Block8143&#45;&gt;Block8682 -->\n",
"<g class=\"edge\" id=\"edge23\"><title>Block8143-&gt;Block8682</title>\n",
"<path d=\"M361.476,-1129.68C341.007,-1113.81 316.709,-1094.96 295.387,-1078.42\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"297.422,-1075.56 287.375,-1072.2 293.131,-1081.1 297.422,-1075.56\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"376.5\" y=\"-1104\">Block8143_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"376.5\" y=\"-1093\">[#,*](300,)</text>\n",
"</g>\n",
"<!-- Constant8248 -->\n",
"<g class=\"node\" id=\"node25\"><title>Constant8248</title>\n",
"<polygon fill=\"white\" points=\"76,-1047 40,-1047 40,-1025 76,-1025 76,-1047\" stroke=\"white\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"58\" y=\"-1032.9\">[ 0.]</text>\n",
"</g>\n",
"<!-- Constant8248&#45;&gt;PastValue8612 -->\n",
"<g class=\"edge\" id=\"edge25\"><title>Constant8248-&gt;PastValue8612</title>\n",
"<path d=\"M55.3475,-1024.81C51.9927,-1009.67 47.894,-980.789 59,-960 67.1057,-944.827 81.7003,-933.132 96.0206,-924.637\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"97.7844,-927.661 104.841,-919.758 94.3963,-921.536 97.7844,-927.661\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"86.5\" y=\"-974\">Constant8248</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"86.5\" y=\"-963\">(1,)</text>\n",
"</g>\n",
"<!-- Parameter8125 -->\n",
"<g class=\"node\" id=\"node26\"><title>Parameter8125</title>\n",
"<polygon fill=\"lightgray\" points=\"391,-1317.5 319,-1317.5 319,-1274.5 391,-1274.5 391,-1317.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"355\" y=\"-1299.4\">E</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"355\" y=\"-1286.4\">(10, 300)</text>\n",
"</g>\n",
"<!-- Parameter8125&#45;&gt;Block8143 -->\n",
"<g class=\"edge\" id=\"edge26\"><title>Parameter8125-&gt;Block8143</title>\n",
"<path d=\"M363.355,-1274.43C370.257,-1257.44 380.301,-1232.72 389.014,-1211.27\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"392.259,-1212.59 392.78,-1202 385.774,-1209.95 392.259,-1212.59\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"401\" y=\"-1234\">E</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"401\" y=\"-1223\">(10, 300)</text>\n",
"</g>\n",
"<!-- Input8123 -->\n",
"<g class=\"node\" id=\"node27\"><title>Input8123</title>\n",
"<polygon fill=\"lightgray\" points=\"463.633,-1260.1 466.701,-1260.3 469.737,-1260.59 472.729,-1260.98 475.664,-1261.47 478.532,-1262.05 481.32,-1262.73 484.018,-1263.5 486.615,-1264.35 489.103,-1265.29 491.471,-1266.32 493.713,-1267.43 495.819,-1268.62 497.784,-1269.88 499.602,-1271.21 501.268,-1272.61 502.777,-1274.08 504.126,-1275.6 505.314,-1277.18 506.338,-1278.82 507.198,-1280.5 507.895,-1282.22 508.428,-1283.98 508.799,-1285.77 509.012,-1287.59 509.07,-1289.44 508.975,-1291.3 508.733,-1293.17 508.348,-1295.06 507.826,-1296.94 507.172,-1298.83 506.392,-1300.7 505.493,-1302.56 504.481,-1304.41 503.364,-1306.23 502.147,-1308.02 500.837,-1309.78 499.442,-1311.5 497.968,-1313.18 496.421,-1314.82 494.809,-1316.4 493.136,-1317.92 491.41,-1319.39 489.635,-1320.79 487.817,-1322.12 485.961,-1323.38 484.072,-1324.57 482.154,-1325.68 480.21,-1326.71 478.245,-1327.65 476.261,-1328.5 474.262,-1329.27 472.251,-1329.95 470.228,-1330.53 468.197,-1331.02 466.16,-1331.41 464.118,-1331.7 462.072,-1331.9 460.024,-1332 457.976,-1332 455.928,-1331.9 453.882,-1331.7 451.84,-1331.41 449.803,-1331.02 447.772,-1330.53 445.749,-1329.95 443.738,-1329.27 441.739,-1328.5 439.755,-1327.65 437.79,-1326.71 435.846,-1325.68 433.928,-1324.57 432.039,-1323.38 430.183,-1322.12 428.365,-1320.79 426.59,-1319.39 424.864,-1317.92 423.191,-1316.4 421.579,-1314.82 420.032,-1313.18 418.558,-1311.5 417.163,-1309.78 415.853,-1308.02 414.636,-1306.23 413.519,-1304.41 412.507,-1302.56 411.608,-1300.7 410.828,-1298.83 410.174,-1296.94 409.652,-1295.06 409.267,-1293.17 409.025,-1291.3 408.93,-1289.44 408.988,-1287.59 409.201,-1285.77 409.572,-1283.98 410.105,-1282.22 410.802,-1280.5 411.662,-1278.82 412.686,-1277.18 413.874,-1275.6 415.223,-1274.08 416.732,-1272.61 418.398,-1271.21 420.216,-1269.88 422.181,-1268.62 424.287,-1267.43 426.529,-1266.32 428.897,-1265.29 431.385,-1264.35 433.982,-1263.5 436.68,-1262.73 439.468,-1262.05 442.336,-1261.47 445.271,-1260.98 448.263,-1260.59 451.299,-1260.3 454.367,-1260.1 457.453,-1260 460.547,-1260 463.633,-1260.1\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"459\" y=\"-1305.9\">Input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"459\" y=\"-1292.9\">q_input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"459\" y=\"-1279.9\">[#,*](10,)</text>\n",
"</g>\n",
"<!-- Input8123&#45;&gt;Block8143 -->\n",
"<g class=\"edge\" id=\"edge27\"><title>Input8123-&gt;Block8143</title>\n",
"<path d=\"M445.065,-1260.7C438.986,-1245.74 431.777,-1227.99 425.276,-1211.99\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"428.343,-1210.24 421.336,-1202.29 421.857,-1212.87 428.343,-1210.24\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"454.5\" y=\"-1234\">q_input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"454.5\" y=\"-1223\">[#,*](10,)</text>\n",
"</g>\n",
"<!-- Block9353 -->\n",
"<g class=\"node\" id=\"node28\"><title>Block9353</title>\n",
"<polygon fill=\"lightgray\" points=\"722,-1072 628,-1072 628,-1000 722,-1000 722,-1072\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"675\" y=\"-1032.9\">GRU</text>\n",
"</g>\n",
"<!-- Block9353&#45;&gt;Block9397 -->\n",
"<g class=\"edge\" id=\"edge28\"><title>Block9353-&gt;Block9397</title>\n",
"<path d=\"M722.169,-1020.77C740.784,-1012.58 760.259,-1000.2 771,-982 777.041,-971.765 775.141,-960.868 769.548,-950.644\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"772.466,-948.711 764.074,-942.244 766.601,-952.533 772.466,-948.711\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"816.5\" y=\"-974\">Block9353_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"816.5\" y=\"-963\">[#,*](64,)</text>\n",
"</g>\n",
"<!-- PastValue9283 -->\n",
"<g class=\"node\" id=\"node33\"><title>PastValue9283</title>\n",
"<polygon fill=\"lightgray\" points=\"591,-927.5 529,-927.5 529,-884.5 591,-884.5 591,-927.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"560\" y=\"-902.9\">PastValue</text>\n",
"</g>\n",
"<!-- Block9353&#45;&gt;PastValue9283 -->\n",
"<g class=\"edge\" id=\"edge35\"><title>Block9353-&gt;PastValue9283</title>\n",
"<path d=\"M627.712,-1026.6C602.117,-1019.38 572.617,-1006.13 557,-982 548.683,-969.147 548.631,-952.295 551.011,-937.822\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"554.486,-938.297 553.089,-927.795 547.632,-936.876 554.486,-938.297\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"599.5\" y=\"-974\">Block9353_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"599.5\" y=\"-963\">[#,*](64,)</text>\n",
"</g>\n",
"<!-- Parameter8823 -->\n",
"<g class=\"node\" id=\"node29\"><title>Parameter8823</title>\n",
"<polygon fill=\"lightgray\" points=\"544,-1187.5 472,-1187.5 472,-1144.5 544,-1144.5 544,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"508\" y=\"-1169.4\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"508\" y=\"-1156.4\">(192,)</text>\n",
"</g>\n",
"<!-- Parameter8823&#45;&gt;Block9353 -->\n",
"<g class=\"edge\" id=\"edge29\"><title>Parameter8823-&gt;Block9353</title>\n",
"<path d=\"M532.234,-1144.48C550.577,-1129.14 576.533,-1107.82 600,-1090 606.249,-1085.25 612.868,-1080.37 619.483,-1075.59\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"621.752,-1078.27 627.832,-1069.59 617.668,-1072.58 621.752,-1078.27\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"611\" y=\"-1104\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"611\" y=\"-1093\">(192,)</text>\n",
"</g>\n",
"<!-- Parameter8824 -->\n",
"<g class=\"node\" id=\"node30\"><title>Parameter8824</title>\n",
"<polygon fill=\"lightgray\" points=\"634,-1187.5 562,-1187.5 562,-1144.5 634,-1144.5 634,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"598\" y=\"-1169.4\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"598\" y=\"-1156.4\">(300, 192)</text>\n",
"</g>\n",
"<!-- Parameter8824&#45;&gt;Block9353 -->\n",
"<g class=\"edge\" id=\"edge30\"><title>Parameter8824-&gt;Block9353</title>\n",
"<path d=\"M608.219,-1144.26C615.943,-1128.99 627.052,-1107.87 638,-1090 639.9,-1086.9 641.912,-1083.73 643.977,-1080.57\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"646.891,-1082.51 649.526,-1072.25 641.067,-1078.63 646.891,-1082.51\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"657\" y=\"-1104\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"657\" y=\"-1093\">(300, 192)</text>\n",
"</g>\n",
"<!-- Parameter8825 -->\n",
"<g class=\"node\" id=\"node31\"><title>Parameter8825</title>\n",
"<polygon fill=\"lightgray\" points=\"724,-1187.5 652,-1187.5 652,-1144.5 724,-1144.5 724,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"688\" y=\"-1169.4\">H</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"688\" y=\"-1156.4\">(64, 128)</text>\n",
"</g>\n",
"<!-- Parameter8825&#45;&gt;Block9353 -->\n",
"<g class=\"edge\" id=\"edge31\"><title>Parameter8825-&gt;Block9353</title>\n",
"<path d=\"M685.911,-1144.43C684.209,-1127.67 681.742,-1103.38 679.585,-1082.14\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"683.048,-1081.6 678.555,-1072 676.083,-1082.31 683.048,-1081.6\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"699\" y=\"-1104\">H</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"699\" y=\"-1093\">(64, 128)</text>\n",
"</g>\n",
"<!-- Parameter8826 -->\n",
"<g class=\"node\" id=\"node32\"><title>Parameter8826</title>\n",
"<polygon fill=\"lightgray\" points=\"814,-1187.5 742,-1187.5 742,-1144.5 814,-1144.5 814,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"778\" y=\"-1169.4\">H1</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"778\" y=\"-1156.4\">(64, 64)</text>\n",
"</g>\n",
"<!-- Parameter8826&#45;&gt;Block9353 -->\n",
"<g class=\"edge\" id=\"edge32\"><title>Parameter8826-&gt;Block9353</title>\n",
"<path d=\"M762.494,-1144.43C750.953,-1129.25 734.717,-1108.16 720,-1090 717.349,-1086.73 714.586,-1083.37 711.789,-1080\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"714.34,-1077.59 705.242,-1072.16 708.969,-1082.08 714.34,-1077.59\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"750.5\" y=\"-1104\">H1</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"750.5\" y=\"-1093\">(64, 64)</text>\n",
"</g>\n",
"<!-- PastValue9283&#45;&gt;Block9353 -->\n",
"<g class=\"edge\" id=\"edge33\"><title>PastValue9283-&gt;Block9353</title>\n",
"<path d=\"M591.398,-920.139C609.472,-929.077 631.422,-942.47 646,-960 653.244,-968.711 658.81,-979.436 663.033,-990.024\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"659.828,-991.451 666.544,-999.646 666.404,-989.052 659.828,-991.451\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"709\" y=\"-974\">PastValue9283_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"709\" y=\"-963\">[#,*](64,)</text>\n",
"</g>\n",
"<!-- Block8814 -->\n",
"<g class=\"node\" id=\"node34\"><title>Block8814</title>\n",
"<polygon fill=\"lightgray\" points=\"926,-1202 832,-1202 832,-1130 926,-1130 926,-1202\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"879\" y=\"-1162.9\">Embedding</text>\n",
"</g>\n",
"<!-- Block8814&#45;&gt;Block9353 -->\n",
"<g class=\"edge\" id=\"edge34\"><title>Block8814-&gt;Block9353</title>\n",
"<path d=\"M831.936,-1131.62C812.671,-1118.29 790.008,-1103.05 769,-1090 756.963,-1082.52 743.832,-1074.84 731.306,-1067.72\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"732.745,-1064.51 722.315,-1062.65 729.305,-1070.61 732.745,-1064.51\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"844.5\" y=\"-1104\">Block8814_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"844.5\" y=\"-1093\">[#,*](300,)</text>\n",
"</g>\n",
"<!-- Constant8919 -->\n",
"<g class=\"node\" id=\"node35\"><title>Constant8919</title>\n",
"<polygon fill=\"white\" points=\"500,-1047 464,-1047 464,-1025 500,-1025 500,-1047\" stroke=\"white\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"482\" y=\"-1032.9\">[ 0.]</text>\n",
"</g>\n",
"<!-- Constant8919&#45;&gt;PastValue9283 -->\n",
"<g class=\"edge\" id=\"edge36\"><title>Constant8919-&gt;PastValue9283</title>\n",
"<path d=\"M479.347,-1024.81C475.993,-1009.67 471.894,-980.789 483,-960 491.106,-944.827 505.7,-933.132 520.021,-924.637\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"521.784,-927.661 528.841,-919.758 518.396,-921.536 521.784,-927.661\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"510.5\" y=\"-974\">Constant8919</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"510.5\" y=\"-963\">(1,)</text>\n",
"</g>\n",
"<!-- Parameter8796 -->\n",
"<g class=\"node\" id=\"node36\"><title>Parameter8796</title>\n",
"<polygon fill=\"lightgray\" points=\"863,-1317.5 791,-1317.5 791,-1274.5 863,-1274.5 863,-1317.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"827\" y=\"-1299.4\">E</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"827\" y=\"-1286.4\">(10, 300)</text>\n",
"</g>\n",
"<!-- Parameter8796&#45;&gt;Block8814 -->\n",
"<g class=\"edge\" id=\"edge37\"><title>Parameter8796-&gt;Block8814</title>\n",
"<path d=\"M835.355,-1274.43C842.257,-1257.44 852.301,-1232.72 861.014,-1211.27\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"864.259,-1212.59 864.78,-1202 857.774,-1209.95 864.259,-1212.59\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"873\" y=\"-1234\">E</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"873\" y=\"-1223\">(10, 300)</text>\n",
"</g>\n",
"<!-- Input8124 -->\n",
"<g class=\"node\" id=\"node37\"><title>Input8124</title>\n",
"<polygon fill=\"lightgray\" points=\"935.633,-1260.1 938.701,-1260.3 941.737,-1260.59 944.729,-1260.98 947.664,-1261.47 950.532,-1262.05 953.32,-1262.73 956.018,-1263.5 958.615,-1264.35 961.103,-1265.29 963.471,-1266.32 965.713,-1267.43 967.819,-1268.62 969.784,-1269.88 971.602,-1271.21 973.268,-1272.61 974.777,-1274.08 976.126,-1275.6 977.314,-1277.18 978.338,-1278.82 979.198,-1280.5 979.895,-1282.22 980.428,-1283.98 980.799,-1285.77 981.012,-1287.59 981.07,-1289.44 980.975,-1291.3 980.733,-1293.17 980.348,-1295.06 979.826,-1296.94 979.172,-1298.83 978.392,-1300.7 977.493,-1302.56 976.481,-1304.41 975.364,-1306.23 974.147,-1308.02 972.837,-1309.78 971.442,-1311.5 969.968,-1313.18 968.421,-1314.82 966.809,-1316.4 965.136,-1317.92 963.41,-1319.39 961.635,-1320.79 959.817,-1322.12 957.961,-1323.38 956.072,-1324.57 954.154,-1325.68 952.21,-1326.71 950.245,-1327.65 948.261,-1328.5 946.262,-1329.27 944.251,-1329.95 942.228,-1330.53 940.197,-1331.02 938.16,-1331.41 936.118,-1331.7 934.072,-1331.9 932.024,-1332 929.976,-1332 927.928,-1331.9 925.882,-1331.7 923.84,-1331.41 921.803,-1331.02 919.772,-1330.53 917.749,-1329.95 915.738,-1329.27 913.739,-1328.5 911.755,-1327.65 909.79,-1326.71 907.846,-1325.68 905.928,-1324.57 904.039,-1323.38 902.183,-1322.12 900.365,-1320.79 898.59,-1319.39 896.864,-1317.92 895.191,-1316.4 893.579,-1314.82 892.032,-1313.18 890.558,-1311.5 889.163,-1309.78 887.853,-1308.02 886.636,-1306.23 885.519,-1304.41 884.507,-1302.56 883.608,-1300.7 882.828,-1298.83 882.174,-1296.94 881.652,-1295.06 881.267,-1293.17 881.025,-1291.3 880.93,-1289.44 880.988,-1287.59 881.201,-1285.77 881.572,-1283.98 882.105,-1282.22 882.802,-1280.5 883.662,-1278.82 884.686,-1277.18 885.874,-1275.6 887.223,-1274.08 888.732,-1272.61 890.398,-1271.21 892.216,-1269.88 894.181,-1268.62 896.287,-1267.43 898.529,-1266.32 900.897,-1265.29 903.385,-1264.35 905.982,-1263.5 908.68,-1262.73 911.468,-1262.05 914.336,-1261.47 917.271,-1260.98 920.263,-1260.59 923.299,-1260.3 926.367,-1260.1 929.453,-1260 932.547,-1260 935.633,-1260.1\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"931\" y=\"-1305.9\">Input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"931\" y=\"-1292.9\">a_input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"931\" y=\"-1279.9\">[#,*](10,)</text>\n",
"</g>\n",
"<!-- Input8124&#45;&gt;Block8814 -->\n",
"<g class=\"edge\" id=\"edge38\"><title>Input8124-&gt;Block8814</title>\n",
"<path d=\"M917.065,-1260.7C910.986,-1245.74 903.777,-1227.99 897.276,-1211.99\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"900.343,-1210.24 893.336,-1202.29 893.857,-1212.87 900.343,-1210.24\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"926.5\" y=\"-1234\">a_input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"926.5\" y=\"-1223\">[#,*](10,)</text>\n",
"</g>\n",
"</g>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from IPython.display import SVG, display\n",
"\n",
"def display_model(model):\n",
" svg = C.logging.graph.plot(model, \"tmp.svg\")\n",
" display(SVG(filename=\"tmp.svg\"))\n",
"\n",
"display_model(model_layers_distinct_axes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's fix this by sharing the embedding. Sharing the GRU parameters can be done in an even simpler way as shown in the unused function `create_model_shared_all`. In the layers library, passing an input to a layer means sharing parameters with all other inputs that get passed to this layer. If you need a copy of the parameters you need to explicitly make one either via `clone()` or by creating a new layer object. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<svg height=\"1340pt\" viewBox=\"0.00 0.00 971.03 1340.00\" width=\"971pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g class=\"graph\" id=\"graph0\" transform=\"scale(1 1) rotate(0) translate(4 1336)\">\n",
"<title>network_graph</title>\n",
"<polygon fill=\"white\" points=\"-4,4 -4,-1336 967.035,-1336 967.035,4 -4,4\" stroke=\"none\"/>\n",
"<!-- Block12641 -->\n",
"<g class=\"node\" id=\"node1\"><title>Block12641</title>\n",
"<polygon fill=\"lightgray\" points=\"181,-191 87,-191 87,-119 181,-119 181,-191\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"134\" y=\"-151.9\">Dense</text>\n",
"</g>\n",
"<!-- Block12641_Output_0 -->\n",
"<g class=\"node\" id=\"node5\"><title>Block12641_Output_0</title>\n",
"<polygon fill=\"lightgray\" points=\"138.633,-0.0986735 141.701,-0.29575 144.737,-0.590689 147.729,-0.982683 150.664,-1.47066 153.532,-2.05327 156.32,-2.72894 159.018,-3.49579 161.615,-4.35174 164.103,-5.29443 166.471,-6.32129 168.713,-7.42949 170.819,-8.616 172.784,-9.87757 174.602,-11.2107 176.268,-12.6119 177.777,-14.0771 179.126,-15.6024 180.314,-17.1836 181.338,-18.8164 182.198,-20.4963 182.895,-22.2187 183.428,-23.9788 183.799,-25.7719 184.012,-27.5931 184.07,-29.4373 183.975,-31.2994 183.733,-33.1745 183.348,-35.0573 182.826,-36.9427 182.172,-38.8255 181.392,-40.7006 180.493,-42.5627 179.481,-44.4069 178.364,-46.2281 177.147,-48.0212 175.837,-49.7813 174.442,-51.5037 172.968,-53.1836 171.421,-54.8164 169.809,-56.3976 168.136,-57.9229 166.41,-59.3881 164.635,-60.7893 162.817,-62.1224 160.961,-63.384 159.072,-64.5705 157.154,-65.6787 155.21,-66.7056 153.245,-67.6483 151.261,-68.5042 149.262,-69.2711 147.251,-69.9467 145.228,-70.5293 143.197,-71.0173 141.16,-71.4093 139.118,-71.7042 137.072,-71.9013 135.024,-72 132.976,-72 130.928,-71.9013 128.882,-71.7042 126.84,-71.4093 124.803,-71.0173 122.772,-70.5293 120.749,-69.9467 118.738,-69.2711 116.739,-68.5042 114.755,-67.6483 112.79,-66.7056 110.846,-65.6787 108.928,-64.5705 107.039,-63.384 105.183,-62.1224 103.365,-60.7893 101.59,-59.3881 99.8637,-57.9229 98.1912,-56.3976 96.5786,-54.8164 95.0321,-53.1836 93.5579,-51.5037 92.1628,-49.7813 90.8533,-48.0212 89.6363,-46.2281 88.5186,-44.4069 87.5069,-42.5627 86.608,-40.7006 85.8284,-38.8255 85.1743,-36.9427 84.652,-35.0573 84.267,-33.1745 84.0248,-31.2994 83.9302,-29.4373 83.9875,-27.5931 84.2005,-25.7719 84.5723,-23.9788 85.1055,-22.2187 85.8016,-20.4963 86.6616,-18.8164 87.6859,-17.1836 88.8735,-15.6024 90.2232,-14.0771 91.7324,-12.6119 93.398,-11.2107 95.2158,-9.87757 97.1809,-8.616 99.2873,-7.42949 101.529,-6.32129 103.897,-5.29443 106.385,-4.35174 108.982,-3.49579 111.68,-2.72894 114.468,-2.05327 117.336,-1.47066 120.271,-0.982683 123.263,-0.590689 126.299,-0.29575 129.367,-0.0986735 132.453,-3.55271e-013 135.547,-3.69482e-013 138.633,-0.0986735\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"134\" y=\"-25.9\">[#](1,)</text>\n",
"</g>\n",
"<!-- Block12641&#45;&gt;Block12641_Output_0 -->\n",
"<g class=\"edge\" id=\"edge4\"><title>Block12641-&gt;Block12641_Output_0</title>\n",
"<path d=\"M134,-118.896C134,-107.327 134,-94.2965 134,-82.0893\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"137.5,-82.079 134,-72.0791 130.5,-82.0791 137.5,-82.079\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"146.5\" y=\"-93\">[#](1,)</text>\n",
"</g>\n",
"<!-- Parameter12152 -->\n",
"<g class=\"node\" id=\"node2\"><title>Parameter12152</title>\n",
"<polygon fill=\"lightgray\" points=\"80,-306.5 8,-306.5 8,-263.5 80,-263.5 80,-306.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"44\" y=\"-288.4\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"44\" y=\"-275.4\">(64, 1)</text>\n",
"</g>\n",
"<!-- Parameter12152&#45;&gt;Block12641 -->\n",
"<g class=\"edge\" id=\"edge1\"><title>Parameter12152-&gt;Block12641</title>\n",
"<path d=\"M58.4601,-263.435C70.5672,-246.216 88.2586,-221.054 103.477,-199.41\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"106.5,-201.196 109.389,-191.003 100.774,-197.17 106.5,-201.196\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"106.5\" y=\"-223\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"106.5\" y=\"-212\">(64, 1)</text>\n",
"</g>\n",
"<!-- Parameter12153 -->\n",
"<g class=\"node\" id=\"node3\"><title>Parameter12153</title>\n",
"<polygon fill=\"lightgray\" points=\"170,-306.5 98,-306.5 98,-263.5 170,-263.5 170,-306.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"134\" y=\"-288.4\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"134\" y=\"-275.4\">(1,)</text>\n",
"</g>\n",
"<!-- Parameter12153&#45;&gt;Block12641 -->\n",
"<g class=\"edge\" id=\"edge2\"><title>Parameter12153-&gt;Block12641</title>\n",
"<path d=\"M134,-263.435C134,-246.673 134,-222.385 134,-201.141\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"137.5,-201.003 134,-191.003 130.5,-201.003 137.5,-201.003\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"140.5\" y=\"-223\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"140.5\" y=\"-212\">(1,)</text>\n",
"</g>\n",
"<!-- Block12616 -->\n",
"<g class=\"node\" id=\"node4\"><title>Block12616</title>\n",
"<polygon fill=\"lightgray\" points=\"282,-321 188,-321 188,-249 282,-249 282,-321\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"235\" y=\"-281.9\">LayerNormalization</text>\n",
"</g>\n",
"<!-- Block12616&#45;&gt;Block12641 -->\n",
"<g class=\"edge\" id=\"edge3\"><title>Block12616-&gt;Block12641</title>\n",
"<path d=\"M207.134,-248.685C195.101,-233.435 180.907,-215.446 168.24,-199.393\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"170.717,-196.882 161.775,-191.2 165.222,-201.219 170.717,-196.882\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"234.5\" y=\"-223\">Block12616_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"234.5\" y=\"-212\">[#](64,)</text>\n",
"</g>\n",
"<!-- Parameter12103 -->\n",
"<g class=\"node\" id=\"node6\"><title>Parameter12103</title>\n",
"<polygon fill=\"lightgray\" points=\"181,-436.5 109,-436.5 109,-393.5 181,-393.5 181,-436.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"145\" y=\"-418.4\">scale</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"145\" y=\"-405.4\">()</text>\n",
"</g>\n",
"<!-- Parameter12103&#45;&gt;Block12616 -->\n",
"<g class=\"edge\" id=\"edge5\"><title>Parameter12103-&gt;Block12616</title>\n",
"<path d=\"M159.46,-393.435C171.567,-376.216 189.259,-351.054 204.477,-329.41\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"207.5,-331.196 210.389,-321.003 201.774,-327.17 207.5,-331.196\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"205.5\" y=\"-353\">scale</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"205.5\" y=\"-342\">()</text>\n",
"</g>\n",
"<!-- Parameter12104 -->\n",
"<g class=\"node\" id=\"node7\"><title>Parameter12104</title>\n",
"<polygon fill=\"lightgray\" points=\"271,-436.5 199,-436.5 199,-393.5 271,-393.5 271,-436.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"235\" y=\"-418.4\">bias</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"235\" y=\"-405.4\">()</text>\n",
"</g>\n",
"<!-- Parameter12104&#45;&gt;Block12616 -->\n",
"<g class=\"edge\" id=\"edge6\"><title>Parameter12104-&gt;Block12616</title>\n",
"<path d=\"M235,-393.435C235,-376.673 235,-352.385 235,-331.141\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"238.5,-331.003 235,-321.003 231.5,-331.003 238.5,-331.003\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"243.5\" y=\"-353\">bias</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"243.5\" y=\"-342\">()</text>\n",
"</g>\n",
"<!-- Block12576 -->\n",
"<g class=\"node\" id=\"node8\"><title>Block12576</title>\n",
"<polygon fill=\"lightgray\" points=\"383,-451 289,-451 289,-379 383,-379 383,-451\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"336\" y=\"-411.9\">Dense</text>\n",
"</g>\n",
"<!-- Block12576&#45;&gt;Block12616 -->\n",
"<g class=\"edge\" id=\"edge7\"><title>Block12576-&gt;Block12616</title>\n",
"<path d=\"M308.134,-378.685C296.101,-363.435 281.907,-345.446 269.24,-329.393\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"271.717,-326.882 262.775,-321.2 266.222,-331.219 271.717,-326.882\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"335.5\" y=\"-353\">Block12576_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"335.5\" y=\"-342\">[#](64,)</text>\n",
"</g>\n",
"<!-- Parameter12083 -->\n",
"<g class=\"node\" id=\"node9\"><title>Parameter12083</title>\n",
"<polygon fill=\"lightgray\" points=\"282,-566.5 210,-566.5 210,-523.5 282,-523.5 282,-566.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"246\" y=\"-548.4\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"246\" y=\"-535.4\">(128, 64)</text>\n",
"</g>\n",
"<!-- Parameter12083&#45;&gt;Block12576 -->\n",
"<g class=\"edge\" id=\"edge8\"><title>Parameter12083-&gt;Block12576</title>\n",
"<path d=\"M260.259,-523.247C270.659,-508.156 285.126,-487.258 298,-469 300.248,-465.812 302.579,-462.522 304.93,-459.216\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"307.825,-461.185 310.78,-451.01 302.125,-457.121 307.825,-461.185\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"315\" y=\"-483\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"315\" y=\"-472\">(128, 64)</text>\n",
"</g>\n",
"<!-- Parameter12084 -->\n",
"<g class=\"node\" id=\"node10\"><title>Parameter12084</title>\n",
"<polygon fill=\"lightgray\" points=\"372,-566.5 300,-566.5 300,-523.5 372,-523.5 372,-566.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"336\" y=\"-548.4\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"336\" y=\"-535.4\">(64,)</text>\n",
"</g>\n",
"<!-- Parameter12084&#45;&gt;Block12576 -->\n",
"<g class=\"edge\" id=\"edge9\"><title>Parameter12084-&gt;Block12576</title>\n",
"<path d=\"M336,-523.435C336,-506.673 336,-482.385 336,-461.141\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"339.5,-461.003 336,-451.003 332.5,-461.003 339.5,-461.003\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"345\" y=\"-483\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"345\" y=\"-472\">(64,)</text>\n",
"</g>\n",
"<!-- Block12551 -->\n",
"<g class=\"node\" id=\"node11\"><title>Block12551</title>\n",
"<polygon fill=\"lightgray\" points=\"484,-581 390,-581 390,-509 484,-509 484,-581\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"437\" y=\"-541.9\">LayerNormalization</text>\n",
"</g>\n",
"<!-- Block12551&#45;&gt;Block12576 -->\n",
"<g class=\"edge\" id=\"edge10\"><title>Block12551-&gt;Block12576</title>\n",
"<path d=\"M409.134,-508.685C397.101,-493.435 382.907,-475.446 370.24,-459.393\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"372.717,-456.882 363.775,-451.2 367.222,-461.219 372.717,-456.882\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"436.5\" y=\"-483\">Block12551_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"436.5\" y=\"-472\">[#](128,)</text>\n",
"</g>\n",
"<!-- Parameter12034 -->\n",
"<g class=\"node\" id=\"node12\"><title>Parameter12034</title>\n",
"<polygon fill=\"lightgray\" points=\"383,-696.5 311,-696.5 311,-653.5 383,-653.5 383,-696.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"347\" y=\"-678.4\">scale</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"347\" y=\"-665.4\">()</text>\n",
"</g>\n",
"<!-- Parameter12034&#45;&gt;Block12551 -->\n",
"<g class=\"edge\" id=\"edge11\"><title>Parameter12034-&gt;Block12551</title>\n",
"<path d=\"M361.46,-653.435C373.567,-636.216 391.259,-611.054 406.477,-589.41\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"409.5,-591.196 412.389,-581.003 403.774,-587.17 409.5,-591.196\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"407.5\" y=\"-613\">scale</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"407.5\" y=\"-602\">()</text>\n",
"</g>\n",
"<!-- Parameter12035 -->\n",
"<g class=\"node\" id=\"node13\"><title>Parameter12035</title>\n",
"<polygon fill=\"lightgray\" points=\"473,-696.5 401,-696.5 401,-653.5 473,-653.5 473,-696.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"437\" y=\"-678.4\">bias</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"437\" y=\"-665.4\">()</text>\n",
"</g>\n",
"<!-- Parameter12035&#45;&gt;Block12551 -->\n",
"<g class=\"edge\" id=\"edge12\"><title>Parameter12035-&gt;Block12551</title>\n",
"<path d=\"M437,-653.435C437,-636.673 437,-612.385 437,-591.141\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"440.5,-591.003 437,-581.003 433.5,-591.003 440.5,-591.003\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"445.5\" y=\"-613\">bias</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"445.5\" y=\"-602\">()</text>\n",
"</g>\n",
"<!-- Block12513 -->\n",
"<g class=\"node\" id=\"node14\"><title>Block12513</title>\n",
"<polygon fill=\"lightgray\" points=\"585,-711 491,-711 491,-639 585,-639 585,-711\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"538\" y=\"-671.9\">Dropout</text>\n",
"</g>\n",
"<!-- Block12513&#45;&gt;Block12551 -->\n",
"<g class=\"edge\" id=\"edge13\"><title>Block12513-&gt;Block12551</title>\n",
"<path d=\"M510.134,-638.685C498.101,-623.435 483.907,-605.446 471.24,-589.393\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"473.717,-586.882 464.775,-581.2 468.222,-591.219 473.717,-586.882\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"537.5\" y=\"-613\">Block12513_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"537.5\" y=\"-602\">[#](128,)</text>\n",
"</g>\n",
"<!-- Splice12021 -->\n",
"<g class=\"node\" id=\"node15\"><title>Splice12021</title>\n",
"<polygon fill=\"lightgray\" points=\"560,-812 516,-812 516,-769 560,-769 560,-812\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"538\" y=\"-787.4\">Splice</text>\n",
"</g>\n",
"<!-- Splice12021&#45;&gt;Block12513 -->\n",
"<g class=\"edge\" id=\"edge14\"><title>Splice12021-&gt;Block12513</title>\n",
"<path d=\"M538,-768.946C538,-755.615 538,-737.71 538,-721.174\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"541.5,-721.108 538,-711.108 534.5,-721.108 541.5,-721.108\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"583.5\" y=\"-743\">Splice12021_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"583.5\" y=\"-732\">[#](128,)</text>\n",
"</g>\n",
"<!-- Block11096 -->\n",
"<g class=\"node\" id=\"node16\"><title>Block11096</title>\n",
"<polygon fill=\"lightgray\" points=\"216,-942 122,-942 122,-870 216,-870 216,-942\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"169\" y=\"-902.9\">Sequence::Slice</text>\n",
"</g>\n",
"<!-- Block11096&#45;&gt;Splice12021 -->\n",
"<g class=\"edge\" id=\"edge15\"><title>Block11096-&gt;Splice12021</title>\n",
"<path d=\"M216.109,-890.51C292.461,-867.025 441.236,-821.264 506.28,-801.257\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"507.447,-804.56 515.976,-798.274 505.389,-797.869 507.447,-804.56\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"452.5\" y=\"-844\">Block11096_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"452.5\" y=\"-833\">[#](64,)</text>\n",
"</g>\n",
"<!-- Block11951 -->\n",
"<g class=\"node\" id=\"node17\"><title>Block11951</title>\n",
"<polygon fill=\"lightgray\" points=\"830,-942 736,-942 736,-870 830,-870 830,-942\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"783\" y=\"-902.9\">Sequence::Slice</text>\n",
"</g>\n",
"<!-- Block11951&#45;&gt;Splice12021 -->\n",
"<g class=\"edge\" id=\"edge16\"><title>Block11951-&gt;Splice12021</title>\n",
"<path d=\"M735.762,-883.116C686.741,-860.406 611.467,-825.535 569.334,-806.016\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"570.665,-802.775 560.12,-801.747 567.722,-809.127 570.665,-802.775\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"711.5\" y=\"-844\">Block11951_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"711.5\" y=\"-833\">[#](64,)</text>\n",
"</g>\n",
"<!-- Block11052 -->\n",
"<g class=\"node\" id=\"node18\"><title>Block11052</title>\n",
"<polygon fill=\"lightgray\" points=\"259,-1072 165,-1072 165,-1000 259,-1000 259,-1072\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"212\" y=\"-1032.9\">GRU</text>\n",
"</g>\n",
"<!-- Block11052&#45;&gt;Block11096 -->\n",
"<g class=\"edge\" id=\"edge17\"><title>Block11052-&gt;Block11096</title>\n",
"<path d=\"M164.707,-1025.93C140.027,-1018.53 111.935,-1005.33 97,-982 86.6292,-965.801 97.3679,-950.093 113.439,-937.273\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"115.694,-939.96 121.688,-931.223 111.554,-934.315 115.694,-939.96\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"141.5\" y=\"-974\">Block11052_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"141.5\" y=\"-963\">[#,*](64,)</text>\n",
"</g>\n",
"<!-- PastValue10972 -->\n",
"<g class=\"node\" id=\"node23\"><title>PastValue10972</title>\n",
"<polygon fill=\"lightgray\" points=\"368,-927.5 306,-927.5 306,-884.5 368,-884.5 368,-927.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"337\" y=\"-902.9\">PastValue</text>\n",
"</g>\n",
"<!-- Block11052&#45;&gt;PastValue10972 -->\n",
"<g class=\"edge\" id=\"edge24\"><title>Block11052-&gt;PastValue10972</title>\n",
"<path d=\"M259.226,-1020.15C278.955,-1011.79 300.591,-999.442 315,-982 325.292,-969.541 330.77,-952.443 333.685,-937.718\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"337.193,-937.955 335.386,-927.516 330.289,-936.803 337.193,-937.955\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"370.5\" y=\"-974\">Block11052_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"370.5\" y=\"-963\">[#,*](64,)</text>\n",
"</g>\n",
"<!-- Parameter10311 -->\n",
"<g class=\"node\" id=\"node19\"><title>Parameter10311</title>\n",
"<polygon fill=\"lightgray\" points=\"72,-1187.5 0,-1187.5 0,-1144.5 72,-1144.5 72,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"36\" y=\"-1169.4\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"36\" y=\"-1156.4\">(192,)</text>\n",
"</g>\n",
"<!-- Parameter10311&#45;&gt;Block11052 -->\n",
"<g class=\"edge\" id=\"edge18\"><title>Parameter10311-&gt;Block11052</title>\n",
"<path d=\"M60.9328,-1144.32C79.8265,-1128.9 106.611,-1107.55 131,-1090 139.082,-1084.18 147.761,-1078.22 156.329,-1072.49\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"158.433,-1075.29 164.834,-1066.85 154.565,-1069.46 158.433,-1075.29\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"142\" y=\"-1104\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"142\" y=\"-1093\">(192,)</text>\n",
"</g>\n",
"<!-- Parameter10312 -->\n",
"<g class=\"node\" id=\"node20\"><title>Parameter10312</title>\n",
"<polygon fill=\"lightgray\" points=\"162,-1187.5 90,-1187.5 90,-1144.5 162,-1144.5 162,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"126\" y=\"-1169.4\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"126\" y=\"-1156.4\">(300, 192)</text>\n",
"</g>\n",
"<!-- Parameter10312&#45;&gt;Block11052 -->\n",
"<g class=\"edge\" id=\"edge19\"><title>Parameter10312-&gt;Block11052</title>\n",
"<path d=\"M137.586,-1144.22C146.32,-1128.93 158.835,-1107.8 171,-1090 173.189,-1086.8 175.504,-1083.53 177.875,-1080.26\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"180.703,-1082.33 183.843,-1072.21 175.078,-1078.16 180.703,-1082.33\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"190\" y=\"-1104\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"190\" y=\"-1093\">(300, 192)</text>\n",
"</g>\n",
"<!-- Parameter10313 -->\n",
"<g class=\"node\" id=\"node21\"><title>Parameter10313</title>\n",
"<polygon fill=\"lightgray\" points=\"252,-1187.5 180,-1187.5 180,-1144.5 252,-1144.5 252,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"216\" y=\"-1169.4\">H</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"216\" y=\"-1156.4\">(64, 128)</text>\n",
"</g>\n",
"<!-- Parameter10313&#45;&gt;Block11052 -->\n",
"<g class=\"edge\" id=\"edge20\"><title>Parameter10313-&gt;Block11052</title>\n",
"<path d=\"M215.357,-1144.43C214.834,-1127.67 214.075,-1103.38 213.411,-1082.14\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"216.905,-1081.89 213.094,-1072 209.908,-1082.11 216.905,-1081.89\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"231\" y=\"-1104\">H</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"231\" y=\"-1093\">(64, 128)</text>\n",
"</g>\n",
"<!-- Parameter10314 -->\n",
"<g class=\"node\" id=\"node22\"><title>Parameter10314</title>\n",
"<polygon fill=\"lightgray\" points=\"342,-1187.5 270,-1187.5 270,-1144.5 342,-1144.5 342,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"306\" y=\"-1169.4\">H1</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"306\" y=\"-1156.4\">(64, 64)</text>\n",
"</g>\n",
"<!-- Parameter10314&#45;&gt;Block11052 -->\n",
"<g class=\"edge\" id=\"edge21\"><title>Parameter10314-&gt;Block11052</title>\n",
"<path d=\"M291.226,-1144.22C280.442,-1129.11 265.424,-1108.21 252,-1090 249.681,-1086.85 247.274,-1083.61 244.844,-1080.35\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"247.586,-1078.18 238.791,-1072.27 241.982,-1082.37 247.586,-1078.18\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"280.5\" y=\"-1104\">H1</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"280.5\" y=\"-1093\">(64, 64)</text>\n",
"</g>\n",
"<!-- PastValue10972&#45;&gt;Block11052 -->\n",
"<g class=\"edge\" id=\"edge22\"><title>PastValue10972-&gt;Block11052</title>\n",
"<path d=\"M305.806,-910.224C275.316,-915.221 230.449,-927.904 209,-960 203.244,-968.614 201.352,-979.11 201.48,-989.49\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"198.009,-990.031 202.212,-999.756 204.992,-989.532 198.009,-990.031\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"262\" y=\"-974\">PastValue10972_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"262\" y=\"-963\">[#,*](64,)</text>\n",
"</g>\n",
"<!-- Block10981 -->\n",
"<g class=\"node\" id=\"node24\"><title>Block10981</title>\n",
"<polygon fill=\"lightgray\" points=\"483,-1202 389,-1202 389,-1130 483,-1130 483,-1202\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"436\" y=\"-1162.9\">Embedding</text>\n",
"</g>\n",
"<!-- Block10981&#45;&gt;Block11052 -->\n",
"<g class=\"edge\" id=\"edge23\"><title>Block10981-&gt;Block11052</title>\n",
"<path d=\"M388.769,-1138.01C353.476,-1117.84 305.11,-1090.21 267.89,-1068.94\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"269.585,-1065.87 259.166,-1063.95 266.112,-1071.95 269.585,-1065.87\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"381.5\" y=\"-1104\">Block10981_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"381.5\" y=\"-1093\">[#,*](300,)</text>\n",
"</g>\n",
"<!-- Constant10407 -->\n",
"<g class=\"node\" id=\"node25\"><title>Constant10407</title>\n",
"<polygon fill=\"white\" points=\"467,-1047 431,-1047 431,-1025 467,-1025 467,-1047\" stroke=\"white\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"449\" y=\"-1032.9\">[ 0.]</text>\n",
"</g>\n",
"<!-- Constant10407&#45;&gt;PastValue10972 -->\n",
"<g class=\"edge\" id=\"edge25\"><title>Constant10407-&gt;PastValue10972</title>\n",
"<path d=\"M448.377,-1024.96C446.947,-1009.5 442.263,-979.587 427,-960 414.304,-943.708 395.089,-931.421 377.668,-922.801\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"378.769,-919.452 368.228,-918.387 375.804,-925.793 378.769,-919.452\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"468\" y=\"-974\">Constant10407</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"468\" y=\"-963\">(1,)</text>\n",
"</g>\n",
"<!-- Parameter10300 -->\n",
"<g class=\"node\" id=\"node26\"><title>Parameter10300</title>\n",
"<polygon fill=\"lightgray\" points=\"713,-1317.5 641,-1317.5 641,-1274.5 713,-1274.5 713,-1317.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"677\" y=\"-1299.4\">E</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"677\" y=\"-1286.4\">(10, 300)</text>\n",
"</g>\n",
"<!-- Parameter10300&#45;&gt;Block10981 -->\n",
"<g class=\"edge\" id=\"edge26\"><title>Parameter10300-&gt;Block10981</title>\n",
"<path d=\"M640.896,-1276.93C604.377,-1258.52 546.419,-1228.93 497,-1202 495.452,-1201.16 493.885,-1200.3 492.305,-1199.43\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"493.612,-1196.15 483.173,-1194.34 490.206,-1202.26 493.612,-1196.15\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"587\" y=\"-1234\">E</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"587\" y=\"-1223\">(10, 300)</text>\n",
"</g>\n",
"<!-- Block11836 -->\n",
"<g class=\"node\" id=\"node34\"><title>Block11836</title>\n",
"<polygon fill=\"lightgray\" points=\"960,-1202 866,-1202 866,-1130 960,-1130 960,-1202\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"913\" y=\"-1162.9\">Embedding</text>\n",
"</g>\n",
"<!-- Parameter10300&#45;&gt;Block11836 -->\n",
"<g class=\"edge\" id=\"edge37\"><title>Parameter10300-&gt;Block11836</title>\n",
"<path d=\"M713.322,-1277.33C750.01,-1259.22 808.097,-1229.85 857,-1202 857.09,-1201.95 857.181,-1201.9 857.271,-1201.85\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"858.929,-1204.93 865.808,-1196.87 855.405,-1198.88 858.929,-1204.93\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"840\" y=\"-1234\">E</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"840\" y=\"-1223\">(10, 300)</text>\n",
"</g>\n",
"<!-- Input8123 -->\n",
"<g class=\"node\" id=\"node27\"><title>Input8123</title>\n",
"<polygon fill=\"lightgray\" points=\"440.633,-1260.1 443.701,-1260.3 446.737,-1260.59 449.729,-1260.98 452.664,-1261.47 455.532,-1262.05 458.32,-1262.73 461.018,-1263.5 463.615,-1264.35 466.103,-1265.29 468.471,-1266.32 470.713,-1267.43 472.819,-1268.62 474.784,-1269.88 476.602,-1271.21 478.268,-1272.61 479.777,-1274.08 481.126,-1275.6 482.314,-1277.18 483.338,-1278.82 484.198,-1280.5 484.895,-1282.22 485.428,-1283.98 485.799,-1285.77 486.012,-1287.59 486.07,-1289.44 485.975,-1291.3 485.733,-1293.17 485.348,-1295.06 484.826,-1296.94 484.172,-1298.83 483.392,-1300.7 482.493,-1302.56 481.481,-1304.41 480.364,-1306.23 479.147,-1308.02 477.837,-1309.78 476.442,-1311.5 474.968,-1313.18 473.421,-1314.82 471.809,-1316.4 470.136,-1317.92 468.41,-1319.39 466.635,-1320.79 464.817,-1322.12 462.961,-1323.38 461.072,-1324.57 459.154,-1325.68 457.21,-1326.71 455.245,-1327.65 453.261,-1328.5 451.262,-1329.27 449.251,-1329.95 447.228,-1330.53 445.197,-1331.02 443.16,-1331.41 441.118,-1331.7 439.072,-1331.9 437.024,-1332 434.976,-1332 432.928,-1331.9 430.882,-1331.7 428.84,-1331.41 426.803,-1331.02 424.772,-1330.53 422.749,-1329.95 420.738,-1329.27 418.739,-1328.5 416.755,-1327.65 414.79,-1326.71 412.846,-1325.68 410.928,-1324.57 409.039,-1323.38 407.183,-1322.12 405.365,-1320.79 403.59,-1319.39 401.864,-1317.92 400.191,-1316.4 398.579,-1314.82 397.032,-1313.18 395.558,-1311.5 394.163,-1309.78 392.853,-1308.02 391.636,-1306.23 390.519,-1304.41 389.507,-1302.56 388.608,-1300.7 387.828,-1298.83 387.174,-1296.94 386.652,-1295.06 386.267,-1293.17 386.025,-1291.3 385.93,-1289.44 385.988,-1287.59 386.201,-1285.77 386.572,-1283.98 387.105,-1282.22 387.802,-1280.5 388.662,-1278.82 389.686,-1277.18 390.874,-1275.6 392.223,-1274.08 393.732,-1272.61 395.398,-1271.21 397.216,-1269.88 399.181,-1268.62 401.287,-1267.43 403.529,-1266.32 405.897,-1265.29 408.385,-1264.35 410.982,-1263.5 413.68,-1262.73 416.468,-1262.05 419.336,-1261.47 422.271,-1260.98 425.263,-1260.59 428.299,-1260.3 431.367,-1260.1 434.453,-1260 437.547,-1260 440.633,-1260.1\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"436\" y=\"-1305.9\">Input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"436\" y=\"-1292.9\">q_input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"436\" y=\"-1279.9\">[#,*](10,)</text>\n",
"</g>\n",
"<!-- Input8123&#45;&gt;Block10981 -->\n",
"<g class=\"edge\" id=\"edge27\"><title>Input8123-&gt;Block10981</title>\n",
"<path d=\"M436,-1259.68C436,-1245.13 436,-1228.09 436,-1212.61\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"439.5,-1212.2 436,-1202.2 432.5,-1212.2 439.5,-1212.2\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"454.5\" y=\"-1234\">q_input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"454.5\" y=\"-1223\">[#,*](10,)</text>\n",
"</g>\n",
"<!-- Block11907 -->\n",
"<g class=\"node\" id=\"node28\"><title>Block11907</title>\n",
"<polygon fill=\"lightgray\" points=\"784,-1072 690,-1072 690,-1000 784,-1000 784,-1072\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"737\" y=\"-1032.9\">GRU</text>\n",
"</g>\n",
"<!-- Block11907&#45;&gt;Block11951 -->\n",
"<g class=\"edge\" id=\"edge28\"><title>Block11907-&gt;Block11951</title>\n",
"<path d=\"M784.138,-1021.1C803.098,-1012.94 823.035,-1000.49 834,-982 840.15,-971.631 838.06,-960.756 832.108,-950.605\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"834.889,-948.474 826.297,-942.275 829.148,-952.479 834.889,-948.474\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"881.5\" y=\"-974\">Block11907_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"881.5\" y=\"-963\">[#,*](64,)</text>\n",
"</g>\n",
"<!-- PastValue11827 -->\n",
"<g class=\"node\" id=\"node33\"><title>PastValue11827</title>\n",
"<polygon fill=\"lightgray\" points=\"646,-927.5 584,-927.5 584,-884.5 646,-884.5 646,-927.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"615\" y=\"-902.9\">PastValue</text>\n",
"</g>\n",
"<!-- Block11907&#45;&gt;PastValue11827 -->\n",
"<g class=\"edge\" id=\"edge35\"><title>Block11907-&gt;PastValue11827</title>\n",
"<path d=\"M736.457,-999.813C734.684,-986.227 730.645,-971.275 722,-960 705.738,-938.791 678.718,-925.502 655.9,-917.535\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"656.687,-914.111 646.095,-914.34 654.518,-920.766 656.687,-914.111\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"776.5\" y=\"-974\">Block11907_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"776.5\" y=\"-963\">[#,*](64,)</text>\n",
"</g>\n",
"<!-- Parameter11166 -->\n",
"<g class=\"node\" id=\"node29\"><title>Parameter11166</title>\n",
"<polygon fill=\"lightgray\" points=\"578,-1187.5 506,-1187.5 506,-1144.5 578,-1144.5 578,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"542\" y=\"-1169.4\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"542\" y=\"-1156.4\">(192,)</text>\n",
"</g>\n",
"<!-- Parameter11166&#45;&gt;Block11907 -->\n",
"<g class=\"edge\" id=\"edge29\"><title>Parameter11166-&gt;Block11907</title>\n",
"<path d=\"M568.065,-1144.5C574.243,-1139.7 580.831,-1134.64 587,-1130 611.155,-1111.83 616.534,-1106.28 642,-1090 654.411,-1082.06 668.079,-1074.05 681.085,-1066.73\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"682.926,-1069.71 689.96,-1061.79 679.52,-1063.59 682.926,-1069.71\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"653\" y=\"-1104\">b</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"653\" y=\"-1093\">(192,)</text>\n",
"</g>\n",
"<!-- Parameter11167 -->\n",
"<g class=\"node\" id=\"node30\"><title>Parameter11167</title>\n",
"<polygon fill=\"lightgray\" points=\"668,-1187.5 596,-1187.5 596,-1144.5 668,-1144.5 668,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"632\" y=\"-1169.4\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"632\" y=\"-1156.4\">(300, 192)</text>\n",
"</g>\n",
"<!-- Parameter11167&#45;&gt;Block11907 -->\n",
"<g class=\"edge\" id=\"edge30\"><title>Parameter11167-&gt;Block11907</title>\n",
"<path d=\"M645.686,-1144.36C656.178,-1128.96 671.305,-1107.62 686,-1090 688.837,-1086.6 691.841,-1083.15 694.916,-1079.72\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"697.551,-1082.02 701.721,-1072.28 692.387,-1077.3 697.551,-1082.02\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"705\" y=\"-1104\">W</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"705\" y=\"-1093\">(300, 192)</text>\n",
"</g>\n",
"<!-- Parameter11168 -->\n",
"<g class=\"node\" id=\"node31\"><title>Parameter11168</title>\n",
"<polygon fill=\"lightgray\" points=\"758,-1187.5 686,-1187.5 686,-1144.5 758,-1144.5 758,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"722\" y=\"-1169.4\">H</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"722\" y=\"-1156.4\">(64, 128)</text>\n",
"</g>\n",
"<!-- Parameter11168&#45;&gt;Block11907 -->\n",
"<g class=\"edge\" id=\"edge31\"><title>Parameter11168-&gt;Block11907</title>\n",
"<path d=\"M724.41,-1144.43C726.374,-1127.67 729.221,-1103.38 731.71,-1082.14\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"735.21,-1082.34 732.898,-1072 728.258,-1081.53 735.21,-1082.34\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"747\" y=\"-1104\">H</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"747\" y=\"-1093\">(64, 128)</text>\n",
"</g>\n",
"<!-- Parameter11169 -->\n",
"<g class=\"node\" id=\"node32\"><title>Parameter11169</title>\n",
"<polygon fill=\"lightgray\" points=\"848,-1187.5 776,-1187.5 776,-1144.5 848,-1144.5 848,-1187.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"812\" y=\"-1169.4\">H1</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"812\" y=\"-1156.4\">(64, 64)</text>\n",
"</g>\n",
"<!-- Parameter11169&#45;&gt;Block11907 -->\n",
"<g class=\"edge\" id=\"edge32\"><title>Parameter11169-&gt;Block11907</title>\n",
"<path d=\"M799.95,-1144.43C789.905,-1127.29 775.248,-1102.28 762.604,-1080.7\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"765.585,-1078.86 757.51,-1072 759.545,-1082.4 765.585,-1078.86\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"792.5\" y=\"-1104\">H1</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"792.5\" y=\"-1093\">(64, 64)</text>\n",
"</g>\n",
"<!-- PastValue11827&#45;&gt;Block11907 -->\n",
"<g class=\"edge\" id=\"edge33\"><title>PastValue11827-&gt;Block11907</title>\n",
"<path d=\"M609.635,-927.536C606.66,-943.686 605.419,-966.048 616,-982 630.484,-1003.84 656.142,-1016.74 680.115,-1024.33\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"679.314,-1027.74 689.894,-1027.18 681.27,-1021.02 679.314,-1027.74\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"669\" y=\"-974\">PastValue11827_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"669\" y=\"-963\">[#,*](64,)</text>\n",
"</g>\n",
"<!-- Block11836&#45;&gt;Block11907 -->\n",
"<g class=\"edge\" id=\"edge34\"><title>Block11836-&gt;Block11907</title>\n",
"<path d=\"M865.863,-1130.15C848.634,-1117.46 828.967,-1103.04 811,-1090 805.021,-1085.66 798.758,-1081.14 792.514,-1076.65\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"794.203,-1073.55 784.039,-1070.56 790.118,-1079.23 794.203,-1073.55\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"883.5\" y=\"-1104\">Block11836_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"883.5\" y=\"-1093\">[#,*](300,)</text>\n",
"</g>\n",
"<!-- Constant11262 -->\n",
"<g class=\"node\" id=\"node35\"><title>Constant11262</title>\n",
"<polygon fill=\"white\" points=\"550,-1047 514,-1047 514,-1025 550,-1025 550,-1047\" stroke=\"white\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"532\" y=\"-1032.9\">[ 0.]</text>\n",
"</g>\n",
"<!-- Constant11262&#45;&gt;PastValue11827 -->\n",
"<g class=\"edge\" id=\"edge36\"><title>Constant11262-&gt;PastValue11827</title>\n",
"<path d=\"M529.27,-1024.77C525.817,-1009.58 521.593,-980.626 533,-960 542.121,-943.508 558.785,-931.344 574.721,-922.86\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"576.55,-925.86 583.957,-918.285 573.443,-919.588 576.55,-925.86\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"563\" y=\"-974\">Constant11262</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"563\" y=\"-963\">(1,)</text>\n",
"</g>\n",
"<!-- Input8124 -->\n",
"<g class=\"node\" id=\"node36\"><title>Input8124</title>\n",
"<polygon fill=\"lightgray\" points=\"917.633,-1260.1 920.701,-1260.3 923.737,-1260.59 926.729,-1260.98 929.664,-1261.47 932.532,-1262.05 935.32,-1262.73 938.018,-1263.5 940.615,-1264.35 943.103,-1265.29 945.471,-1266.32 947.713,-1267.43 949.819,-1268.62 951.784,-1269.88 953.602,-1271.21 955.268,-1272.61 956.777,-1274.08 958.126,-1275.6 959.314,-1277.18 960.338,-1278.82 961.198,-1280.5 961.895,-1282.22 962.428,-1283.98 962.799,-1285.77 963.012,-1287.59 963.07,-1289.44 962.975,-1291.3 962.733,-1293.17 962.348,-1295.06 961.826,-1296.94 961.172,-1298.83 960.392,-1300.7 959.493,-1302.56 958.481,-1304.41 957.364,-1306.23 956.147,-1308.02 954.837,-1309.78 953.442,-1311.5 951.968,-1313.18 950.421,-1314.82 948.809,-1316.4 947.136,-1317.92 945.41,-1319.39 943.635,-1320.79 941.817,-1322.12 939.961,-1323.38 938.072,-1324.57 936.154,-1325.68 934.21,-1326.71 932.245,-1327.65 930.261,-1328.5 928.262,-1329.27 926.251,-1329.95 924.228,-1330.53 922.197,-1331.02 920.16,-1331.41 918.118,-1331.7 916.072,-1331.9 914.024,-1332 911.976,-1332 909.928,-1331.9 907.882,-1331.7 905.84,-1331.41 903.803,-1331.02 901.772,-1330.53 899.749,-1329.95 897.738,-1329.27 895.739,-1328.5 893.755,-1327.65 891.79,-1326.71 889.846,-1325.68 887.928,-1324.57 886.039,-1323.38 884.183,-1322.12 882.365,-1320.79 880.59,-1319.39 878.864,-1317.92 877.191,-1316.4 875.579,-1314.82 874.032,-1313.18 872.558,-1311.5 871.163,-1309.78 869.853,-1308.02 868.636,-1306.23 867.519,-1304.41 866.507,-1302.56 865.608,-1300.7 864.828,-1298.83 864.174,-1296.94 863.652,-1295.06 863.267,-1293.17 863.025,-1291.3 862.93,-1289.44 862.988,-1287.59 863.201,-1285.77 863.572,-1283.98 864.105,-1282.22 864.802,-1280.5 865.662,-1278.82 866.686,-1277.18 867.874,-1275.6 869.223,-1274.08 870.732,-1272.61 872.398,-1271.21 874.216,-1269.88 876.181,-1268.62 878.287,-1267.43 880.529,-1266.32 882.897,-1265.29 885.385,-1264.35 887.982,-1263.5 890.68,-1262.73 893.468,-1262.05 896.336,-1261.47 899.271,-1260.98 902.263,-1260.59 905.299,-1260.3 908.367,-1260.1 911.453,-1260 914.547,-1260 917.633,-1260.1\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"913\" y=\"-1305.9\">Input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"913\" y=\"-1292.9\">a_input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"913\" y=\"-1279.9\">[#,*](10,)</text>\n",
"</g>\n",
"<!-- Input8124&#45;&gt;Block11836 -->\n",
"<g class=\"edge\" id=\"edge38\"><title>Input8124-&gt;Block11836</title>\n",
"<path d=\"M913,-1259.68C913,-1245.13 913,-1228.09 913,-1212.61\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"916.5,-1212.2 913,-1202.2 909.5,-1212.2 916.5,-1212.2\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"931.5\" y=\"-1234\">a_input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"931.5\" y=\"-1223\">[#,*](10,)</text>\n",
"</g>\n",
"</g>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def create_model_shared_embedding(question_input, answer_input):\n",
" with C.default_options(init=C.glorot_uniform()):\n",
" e = C.layers.Embedding(300)\n",
" question_stack = C.layers.Sequential([e, C.layers.Fold(C.layers.GRU(64))])(question_input)\n",
" answer_stack = C.layers.Sequential([e, C.layers.Fold(C.layers.GRU(64))])(answer_input)\n",
" combined = C.splice(question_stack, answer_stack)\n",
" model = C.layers.Sequential([C.layers.Dropout(0.5),\n",
" C.layers.LayerNormalization(),\n",
" C.layers.Dense(64, activation=C.sigmoid),\n",
" C.layers.LayerNormalization(),\n",
" C.layers.Dense(1, activation=C.softmax)])\n",
" return model(combined)\n",
"\n",
"def create_model_shared_all(question_input, answer_input):\n",
" with C.default_options(init=C.glorot_uniform()):\n",
" stack = C.layers.Sequential([C.layers.Embedding(300), C.layers.Fold(C.layers.GRU(64))])\n",
" question_stack = stack(question_input)\n",
" answer_stack = stack(answer_input)\n",
" combined = C.splice(question_stack, answer_stack)\n",
" model = C.layers.Sequential([cl.Dropout(0.5),\n",
" C.layers.LayerNormalization(),\n",
" C.layers.Dense(64, activation=C.sigmoid),\n",
" C.layers.LayerNormalization(),\n",
" C.layers.Dense(1, activation=C.softmax)])\n",
" return model(combined)\n",
"\n",
"model_shared_embedding = create_model_shared_embedding(q_input, a_input)\n",
"\n",
"display_model(model_shared_embedding)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Guideline 4\n",
"\n",
"> Verify weight sharing and other structural issues in your network by plotting the underlying graph\n",
"\n",
"We are much better at processing visual information than by following the equations of a big model. With CNTK only the necessary dimensions need to be specified and everything else can be inferred. However when we plot a graph we can see the shapes of all inputs, outputs, and parameters at the same time, without having to do the shape inference in our heads. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### More model bugs\n",
"\n",
"Once all the structural bugs have been eliminated we can proceed in finding bugs related to running data through the network. We can start with feeding random data but a better choice is to feed the first few minibatches of real data through the network. This can reveal scale issues i.e. that the output or some other intermediate layer can take on too large values. \n",
"\n",
"A common cause of this is that the **learning rate is too high**. This will be observed from the second minibatch onwards and it can cause the learning to diverge. If you see large values in parameters or other outputs, just reduce the learning rate by a factor of 2 and retry until things look stable. \n",
"\n",
"Another possibility can be that the **data contains large values** which can cause intermediate outputs to become large and even overflow if the network is doing a lot of processing (such as an RNN on a long sequence or a very deep network). The training procedures currently used actually work better when the input values do not contain outliers and are centered or close to 0 (this is the reason why in many examples with image data you can see that the first thing that happens is the subtraction of the average pixel value). If you have large values in the input you can try dividing the data by the maximum value. If you have non-negative values and you want to mostly preserve the order of magnitude but don't care so much about the exact value you can transform your inputs with a `log` i.e. `transformed_features = C.log(1+features)`.\n",
"\n",
"In our sample code we have a problem that could be detected simply by feeding random data so we will do just that:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1.],\n",
" [ 1.],\n",
" [ 1.],\n",
" [ 1.],\n",
" [ 1.]], dtype=float32)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random_questions = [sparse.rand(i*i+1, 10, density=0.5, format='csr', dtype=np.float32) for i in range(5)]\n",
"random_answers = [sparse.rand(i+1, 10, density=0.5, format='csr', dtype=np.float32) for i in range(5)] \n",
"\n",
"model_shared_embedding.eval({q_input:random_questions, a_input:random_answers})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is very suspicious. We gave 5 **random** \"questions\" (of lengths 1, 2, 5, 10, and 17), and 5 **random** \"answers\" (of lengths 1, 2, 3, 4, and 5) and we got the **same** response. Again we can perform a binary search through the network to see where the responses become so uniform. We find that the following network behaves as expected"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 1.32114685e+00 2.15835363e-01 1.21132720e+00]\n",
" [ 1.14431441e+00 -4.78121191e-01 -7.26600170e-01]\n",
" [ 9.21114624e-01 -1.12726915e+00 7.72424161e-01]\n",
" [ 1.31514442e+00 -4.65529203e-01 4.94043827e-01]\n",
" [ 1.53898752e+00 -1.37763005e-03 1.37341189e+00]]\n"
]
}
],
"source": [
"def create_model_shared_embedding_working(question_input, answer_input):\n",
" with C.default_options(init=C.glorot_uniform()):\n",
" e = C.layers.Embedding(300)\n",
" question_stack = C.layers.Sequential([e, C.layers.Fold(C.layers.GRU(64))])(question_input)\n",
" answer_stack = C.layers.Sequential([e, C.layers.Fold(C.layers.GRU(64))])(answer_input)\n",
" combined = C.splice(question_stack, answer_stack)\n",
" model = C.layers.Sequential([C.layers.Dropout(0.5),\n",
" C.layers.LayerNormalization(),\n",
" C.layers.Dense(64, activation=C.sigmoid),\n",
" C.layers.LayerNormalization()])\n",
" return model(combined)\n",
"\n",
"model_shared_embedding_working = create_model_shared_embedding_working(q_input, a_input)\n",
"\n",
"working_outputs = model_shared_embedding_working.eval({q_input:random_questions, a_input:random_answers})\n",
"print(working_outputs[:,:3])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The only difference then is this line\n",
"```python\n",
"C.layers.Dense(1, activation=C.softmax)\n",
"```\n",
"We can play around a little more e.g. by modifying the activation or the number of outputs and we find that everything is working except for the combination of arguments given above. If we look at the definition of softmax we can see the problem:\n",
"$$\n",
"\\textrm{softmax}(z) = \\left(\\begin{array}{c} \\frac{\\exp(z_1)}{\\sum_j \\exp(z_j)}\\\\ \\frac{\\exp(z_2)}{\\sum_j \\exp(z_j)}\\\\ \\vdots \\\\ \\frac{\\exp(z_n)}{\\sum_j \\exp(z_j)} \\end{array}\\right)\n",
"$$\n",
"\n",
"and we only have one output! So the softmax will compute the exponential of that output and then **divide it by itself** giving us 1. One solution here is to have two outputs, one for each class. This is different from how binary classification is typically done where there's a single output representing the probability of the positive class. This latter approach can be implemented by using a sigmoid non-linearity. Therefore either of the following will work:\n",
"```python\n",
"cl.Dense(1, activation=C.sigmoid)\n",
"```\n",
"or\n",
"```python\n",
"cl.Dense(2, activation=C.softmax)\n",
"```\n",
"\n",
"### Guideline 5\n",
"\n",
"> Feed some data to your network and look for large values in the output or other suspicious behavior.\n",
"\n",
"It's also good if you can train for a few minibatches to see if different outputs in the network exhibit worrisome trends, which could mean that your learning rate is very large.\n",
"\n",
"\n",
"### Tricky errors\n",
"\n",
"Even after you have tried all of the above, you might still ran into problems. One example is a `NaN` (Not-a-Number) which you can get from operations whose meaning is not defined (for example $0 \\times \\infty$ or ${(-0.5)}^{0.5}$). Another case is if you are writing your own layer and it is not behaving as expected. CNTK offers some support to find your issue. Here's a contrived example that demonstrates how to catch where `NaN`s are generated."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ nan]], dtype=float32)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"C.debugging.set_checked_mode(False)\n",
"w = C.input_variable(1)\n",
"x = C.input_variable(1)\n",
"y = C.layers.Sequential([C.square, C.square, C.square])\n",
"z = C.exp(-y(x))*C.exp(y(w))+1\n",
"\n",
"w0 = np.array([3.0],dtype=np.float32)\n",
"x0 = np.array([3.0],dtype=np.float32)\n",
"z.eval({w:w0, x:x0})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The code computes $3^8=6561$ and then takes the exponetial of it (which overflows to infinity) and the expoential of it's negative (which underflows to 0). The result above is because $0 \\times \\infty$ is `NaN` according to the floating point standard. If we understand the issue like in this contrived example we can rearrange our computations for example as "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 2.]], dtype=float32)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"z_stable = C.exp(-y(x)+y(w))+1\n",
"\n",
"w0 = np.array([3.0],dtype=np.float32)\n",
"x0 = np.array([3.0],dtype=np.float32)\n",
"z_stable.eval({w:w0, x:x0})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Typically we don't know what causes the `NaN` to get generated. CNTK provides a \"checked mode\" where `NaN`s can cause an exception. The request for checked_mode needs to be specified before the function is created."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ nan]], dtype=float32)"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"C.debugging.set_checked_mode(True)\n",
"z.eval({w:w0, x:x0})"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Error: ElementTimes15622 ElementTimes operation unexpectedly produced NaN values.\n"
]
},
{
"data": {
"image/svg+xml": [
"<svg height=\"865pt\" viewBox=\"0.00 0.00 574.41 865.00\" width=\"574pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g class=\"graph\" id=\"graph0\" transform=\"scale(1 1) rotate(0) translate(4 861)\">\n",
"<title>network_graph</title>\n",
"<polygon fill=\"white\" points=\"-4,4 -4,-861 570.412,-861 570.412,4 -4,4\" stroke=\"none\"/>\n",
"<!-- Plus15626 -->\n",
"<g class=\"node\" id=\"node1\"><title>Plus15626</title>\n",
"<ellipse cx=\"273.412\" cy=\"-133.5\" fill=\"lightgray\" rx=\"14.5\" ry=\"14.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"20.00\" text-anchor=\"middle\" x=\"273.412\" y=\"-128.5\">+</text>\n",
"</g>\n",
"<!-- Plus15626_Output_0 -->\n",
"<g class=\"node\" id=\"node4\"><title>Plus15626_Output_0</title>\n",
"<polygon fill=\"lightgray\" points=\"278.045,-0.0986735 281.112,-0.29575 284.148,-0.590689 287.14,-0.982683 290.076,-1.47066 292.943,-2.05327 295.731,-2.72894 298.429,-3.49579 301.027,-4.35174 303.515,-5.29443 305.883,-6.32129 308.124,-7.42949 310.231,-8.616 312.196,-9.87757 314.014,-11.2107 315.679,-12.6119 317.188,-14.0771 318.538,-15.6024 319.726,-17.1836 320.75,-18.8164 321.61,-20.4963 322.306,-22.2187 322.839,-23.9788 323.211,-25.7719 323.424,-27.5931 323.481,-29.4373 323.387,-31.2994 323.145,-33.1745 322.76,-35.0573 322.237,-36.9427 321.583,-38.8255 320.804,-40.7006 319.905,-42.5627 318.893,-44.4069 317.775,-46.2281 316.558,-48.0212 315.249,-49.7813 313.854,-51.5037 312.379,-53.1836 310.833,-54.8164 309.22,-56.3976 307.548,-57.9229 305.821,-59.3881 304.047,-60.7893 302.229,-62.1224 300.373,-63.384 298.484,-64.5705 296.565,-65.6787 294.622,-66.7056 292.657,-67.6483 290.673,-68.5042 288.674,-69.2711 286.662,-69.9467 284.64,-70.5293 282.609,-71.0173 280.572,-71.4093 278.529,-71.7042 276.484,-71.9013 274.436,-72 272.387,-72 270.34,-71.9013 268.294,-71.7042 266.251,-71.4093 264.214,-71.0173 262.183,-70.5293 260.161,-69.9467 258.149,-69.2711 256.15,-68.5042 254.167,-67.6483 252.201,-66.7056 250.258,-65.6787 248.339,-64.5705 246.45,-63.384 244.594,-62.1224 242.776,-60.7893 241.002,-59.3881 239.275,-57.9229 237.603,-56.3976 235.99,-54.8164 234.444,-53.1836 232.969,-51.5037 231.574,-49.7813 230.265,-48.0212 229.048,-46.2281 227.93,-44.4069 226.918,-42.5627 226.02,-40.7006 225.24,-38.8255 224.586,-36.9427 224.064,-35.0573 223.679,-33.1745 223.436,-31.2994 223.342,-29.4373 223.399,-27.5931 223.612,-25.7719 223.984,-23.9788 224.517,-22.2187 225.213,-20.4963 226.073,-18.8164 227.097,-17.1836 228.285,-15.6024 229.635,-14.0771 231.144,-12.6119 232.81,-11.2107 234.627,-9.87757 236.592,-8.616 238.699,-7.42949 240.94,-6.32129 243.309,-5.29443 245.796,-4.35174 248.394,-3.49579 251.092,-2.72894 253.88,-2.05327 256.747,-1.47066 259.683,-0.982683 262.675,-0.590689 265.711,-0.29575 268.778,-0.0986735 271.865,-3.55271e-013 274.958,-3.69482e-013 278.045,-0.0986735\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"273.412\" y=\"-25.9\">[#](1,)</text>\n",
"</g>\n",
"<!-- Plus15626&#45;&gt;Plus15626_Output_0 -->\n",
"<g class=\"edge\" id=\"edge3\"><title>Plus15626-&gt;Plus15626_Output_0</title>\n",
"<path d=\"M273.412,-118.817C273.412,-109.118 273.412,-95.5053 273.412,-82.1629\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"276.912,-82.0818 273.412,-72.0819 269.912,-82.0819 276.912,-82.0818\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"285.912\" y=\"-93\">[#](1,)</text>\n",
"</g>\n",
"<!-- ElementTimes15622 -->\n",
"<g class=\"node\" id=\"node2\"><title>ElementTimes15622</title>\n",
"<ellipse cx=\"201.412\" cy=\"-220.5\" fill=\"lightgray\" rx=\"14.5\" ry=\"14.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"20.00\" text-anchor=\"middle\" x=\"201.412\" y=\"-215.5\">*</text>\n",
"</g>\n",
"<!-- ElementTimes15622&#45;&gt;Plus15626 -->\n",
"<g class=\"edge\" id=\"edge1\"><title>ElementTimes15622-&gt;Plus15626</title>\n",
"<path d=\"M197.906,-206.01C195.715,-194.42 194.603,-177.765 202.412,-166 212.811,-150.332 232.949,-142.386 249.13,-138.399\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"249.986,-141.796 259.049,-136.31 248.542,-134.946 249.986,-141.796\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"263.412\" y=\"-180\">ElementTimes15622_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"263.412\" y=\"-169\">[#](1,)</text>\n",
"</g>\n",
"<!-- Constant15625 -->\n",
"<g class=\"node\" id=\"node3\"><title>Constant15625</title>\n",
"<polygon fill=\"white\" points=\"362.412,-231.5 330.412,-231.5 330.412,-209.5 362.412,-209.5 362.412,-231.5\" stroke=\"white\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"346.412\" y=\"-217.4\">1.0</text>\n",
"</g>\n",
"<!-- Constant15625&#45;&gt;Plus15626 -->\n",
"<g class=\"edge\" id=\"edge2\"><title>Constant15625-&gt;Plus15626</title>\n",
"<path d=\"M344.27,-209.441C341.393,-197.884 335.33,-178.863 324.412,-166 316.734,-156.955 305.781,-149.719 295.992,-144.475\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"297.27,-141.202 286.755,-139.901 294.164,-147.475 297.27,-141.202\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"367.412\" y=\"-180\">Constant15625</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"367.412\" y=\"-169\">()</text>\n",
"</g>\n",
"<!-- Exp15604 -->\n",
"<g class=\"node\" id=\"node5\"><title>Exp15604</title>\n",
"<ellipse cx=\"139.412\" cy=\"-307.5\" fill=\"lightgray\" rx=\"14.5\" ry=\"14.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"139.412\" y=\"-304.4\">Exp</text>\n",
"</g>\n",
"<!-- Exp15604&#45;&gt;ElementTimes15622 -->\n",
"<g class=\"edge\" id=\"edge4\"><title>Exp15604-&gt;ElementTimes15622</title>\n",
"<path d=\"M145.481,-294.05C151.207,-282.838 160.39,-266.148 170.412,-253 174.345,-247.84 179.062,-242.652 183.616,-238.017\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"186.253,-240.334 190.959,-230.842 181.361,-235.327 186.253,-240.334\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"210.912\" y=\"-267\">Exp15604_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"210.912\" y=\"-256\">[#](1,)</text>\n",
"</g>\n",
"<!-- Exp15619 -->\n",
"<g class=\"node\" id=\"node6\"><title>Exp15619</title>\n",
"<ellipse cx=\"278.412\" cy=\"-307.5\" fill=\"lightgray\" rx=\"14.5\" ry=\"14.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"278.412\" y=\"-304.4\">Exp</text>\n",
"</g>\n",
"<!-- Exp15619&#45;&gt;ElementTimes15622 -->\n",
"<g class=\"edge\" id=\"edge5\"><title>Exp15619-&gt;ElementTimes15622</title>\n",
"<path d=\"M274.102,-293.434C269.772,-281.806 262.191,-264.854 251.412,-253 243.646,-244.461 233.05,-237.344 223.61,-232.048\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"225.185,-228.922 214.704,-227.373 221.932,-235.12 225.185,-228.922\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"306.912\" y=\"-267\">Exp15619_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"306.912\" y=\"-256\">[#](1,)</text>\n",
"</g>\n",
"<!-- Negate15601 -->\n",
"<g class=\"node\" id=\"node7\"><title>Negate15601</title>\n",
"<polygon fill=\"lightgray\" points=\"160.412,-423 110.412,-423 110.412,-380 160.412,-380 160.412,-423\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"135.412\" y=\"-398.4\">Negate</text>\n",
"</g>\n",
"<!-- Negate15601&#45;&gt;Exp15604 -->\n",
"<g class=\"edge\" id=\"edge6\"><title>Negate15601-&gt;Exp15604</title>\n",
"<path d=\"M136.319,-379.624C136.929,-365.598 137.734,-347.077 138.37,-332.451\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"141.88,-332.307 138.817,-322.165 134.886,-332.003 141.88,-332.307\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"185.912\" y=\"-354\">Negate15601_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"185.912\" y=\"-343\">[#](1,)</text>\n",
"</g>\n",
"<!-- ElementTimes15595 -->\n",
"<g class=\"node\" id=\"node8\"><title>ElementTimes15595</title>\n",
"<ellipse cx=\"135.412\" cy=\"-495.5\" fill=\"lightgray\" rx=\"14.5\" ry=\"14.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"20.00\" text-anchor=\"middle\" x=\"135.412\" y=\"-490.5\">*</text>\n",
"</g>\n",
"<!-- ElementTimes15595&#45;&gt;Negate15601 -->\n",
"<g class=\"edge\" id=\"edge7\"><title>ElementTimes15595-&gt;Negate15601</title>\n",
"<path d=\"M135.412,-480.925C135.412,-468.461 135.412,-449.533 135.412,-433.396\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"138.912,-433.29 135.412,-423.29 131.912,-433.29 138.912,-433.29\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"196.412\" y=\"-455\">ElementTimes15595_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"196.412\" y=\"-444\">[#](1,)</text>\n",
"</g>\n",
"<!-- ElementTimes15593 -->\n",
"<g class=\"node\" id=\"node9\"><title>ElementTimes15593</title>\n",
"<ellipse cx=\"21.4116\" cy=\"-582.5\" fill=\"lightgray\" rx=\"14.5\" ry=\"14.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"20.00\" text-anchor=\"middle\" x=\"21.4116\" y=\"-577.5\">*</text>\n",
"</g>\n",
"<!-- ElementTimes15593&#45;&gt;ElementTimes15595 -->\n",
"<g class=\"edge\" id=\"edge8\"><title>ElementTimes15593-&gt;ElementTimes15595</title>\n",
"<path d=\"M12.57,-570.602C4.27789,-558.979 -5.48128,-540.554 4.41155,-528 17.4105,-511.505 76.6265,-502.652 110.828,-498.836\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"111.384,-502.297 120.963,-497.771 110.652,-495.335 111.384,-502.297\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"65.4116\" y=\"-542\">ElementTimes15593_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"65.4116\" y=\"-531\">[#](1,)</text>\n",
"</g>\n",
"<!-- ElementTimes15593&#45;&gt;ElementTimes15595 -->\n",
"<g class=\"edge\" id=\"edge9\"><title>ElementTimes15593-&gt;ElementTimes15595</title>\n",
"<path d=\"M36.1584,-581.658C59.1355,-581.043 103.242,-576.258 126.412,-550 133.534,-541.928 136.091,-530.399 136.739,-520.11\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"140.239,-520.065 136.834,-510.032 133.24,-519.998 140.239,-520.065\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"197.412\" y=\"-542\">ElementTimes15593_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"197.412\" y=\"-531\">[#](1,)</text>\n",
"</g>\n",
"<!-- ElementTimes15591 -->\n",
"<g class=\"node\" id=\"node10\"><title>ElementTimes15591</title>\n",
"<ellipse cx=\"83.4116\" cy=\"-691\" fill=\"lightgray\" rx=\"14.5\" ry=\"14.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"20.00\" text-anchor=\"middle\" x=\"83.4116\" y=\"-686\">*</text>\n",
"</g>\n",
"<!-- ElementTimes15591&#45;&gt;ElementTimes15593 -->\n",
"<g class=\"edge\" id=\"edge10\"><title>ElementTimes15591-&gt;ElementTimes15593</title>\n",
"<path d=\"M70.0295,-684.72C52.813,-677.107 23.6676,-661.24 11.4116,-637 6.42324,-627.134 8.08333,-615.111 11.3567,-604.964\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"14.6389,-606.18 14.9814,-595.591 8.11004,-603.655 14.6389,-606.18\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"72.4116\" y=\"-629\">ElementTimes15591_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"72.4116\" y=\"-618\">[#](1,)</text>\n",
"</g>\n",
"<!-- ElementTimes15591&#45;&gt;ElementTimes15593 -->\n",
"<g class=\"edge\" id=\"edge11\"><title>ElementTimes15591-&gt;ElementTimes15593</title>\n",
"<path d=\"M95.2481,-682.341C115.127,-668.534 151.366,-638.7 133.412,-615 113.348,-588.515 72.8234,-583.061 46.4181,-582.545\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"46.055,-579.046 36.0668,-582.58 46.0785,-586.046 46.055,-579.046\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"199.412\" y=\"-629\">ElementTimes15591_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"199.412\" y=\"-618\">[#](1,)</text>\n",
"</g>\n",
"<!-- Input15475 -->\n",
"<g class=\"node\" id=\"node11\"><title>Input15475</title>\n",
"<polygon fill=\"lightgray\" points=\"88.0448,-785.099 91.1125,-785.296 94.1483,-785.591 97.1401,-785.983 100.076,-786.471 102.943,-787.053 105.731,-787.729 108.429,-788.496 111.027,-789.352 113.515,-790.294 115.883,-791.321 118.124,-792.429 120.231,-793.616 122.196,-794.878 124.014,-796.211 125.679,-797.612 127.188,-799.077 128.538,-800.602 129.726,-802.184 130.75,-803.816 131.61,-805.496 132.306,-807.219 132.839,-808.979 133.211,-810.772 133.424,-812.593 133.481,-814.437 133.387,-816.299 133.145,-818.175 132.76,-820.057 132.237,-821.943 131.583,-823.825 130.804,-825.701 129.905,-827.563 128.893,-829.407 127.775,-831.228 126.558,-833.021 125.249,-834.781 123.854,-836.504 122.379,-838.184 120.833,-839.816 119.22,-841.398 117.548,-842.923 115.821,-844.388 114.047,-845.789 112.229,-847.122 110.373,-848.384 108.484,-849.571 106.565,-850.679 104.622,-851.706 102.657,-852.648 100.673,-853.504 98.674,-854.271 96.6622,-854.947 94.6398,-855.529 92.609,-856.017 90.5716,-856.409 88.5293,-856.704 86.4835,-856.901 84.4358,-857 82.3873,-857 80.3396,-856.901 78.2938,-856.704 76.2515,-856.409 74.2141,-856.017 72.1833,-855.529 70.1609,-854.947 68.1491,-854.271 66.1501,-853.504 64.1665,-852.648 62.2013,-851.706 60.2578,-850.679 58.3394,-849.571 56.4501,-848.384 54.5943,-847.122 52.7765,-845.789 51.0017,-844.388 49.2752,-842.923 47.6028,-841.398 45.9902,-839.816 44.4436,-838.184 42.9695,-836.504 41.5743,-834.781 40.2649,-833.021 39.0479,-831.228 37.9302,-829.407 36.9185,-827.563 36.0196,-825.701 35.2399,-823.825 34.5859,-821.943 34.0635,-820.057 33.6786,-818.175 33.4364,-816.299 33.3418,-814.437 33.3991,-812.593 33.6121,-810.772 33.9839,-808.979 34.517,-807.219 35.2131,-805.496 36.0732,-803.816 37.0974,-802.184 38.2851,-800.602 39.6347,-799.077 41.144,-797.612 42.8096,-796.211 44.6274,-794.878 46.5924,-793.616 48.6989,-792.429 50.9401,-791.321 53.3086,-790.294 55.7961,-789.352 58.3937,-788.496 61.0917,-787.729 63.8799,-787.053 66.7474,-786.471 69.683,-785.983 72.6748,-785.591 75.7107,-785.296 78.7783,-785.099 81.865,-785 84.9581,-785 88.0448,-785.099\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"83.4116\" y=\"-824.4\">Input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"83.4116\" y=\"-811.4\">[#](1,)</text>\n",
"</g>\n",
"<!-- Input15475&#45;&gt;ElementTimes15591 -->\n",
"<g class=\"edge\" id=\"edge12\"><title>Input15475-&gt;ElementTimes15591</title>\n",
"<path d=\"M59.602,-787.986C55.8122,-781.356 52.4673,-774.191 50.4116,-767 44.6588,-746.877 56.4178,-725.112 67.4605,-710.239\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"70.2554,-712.347 73.7416,-702.343 64.7772,-707.99 70.2554,-712.347\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"72.9116\" y=\"-759\">Input15475</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"72.9116\" y=\"-748\">[#](1,)</text>\n",
"</g>\n",
"<!-- Input15475&#45;&gt;ElementTimes15591 -->\n",
"<g class=\"edge\" id=\"edge13\"><title>Input15475-&gt;ElementTimes15591</title>\n",
"<path d=\"M92.9091,-785.367C95.3355,-772.702 96.8912,-758.236 95.4116,-745 94.3116,-735.161 92.0971,-724.483 89.8666,-715.307\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"93.1897,-714.176 87.3066,-705.365 86.4109,-715.922 93.1897,-714.176\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"118.912\" y=\"-759\">Input15475</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"118.912\" y=\"-748\">[#](1,)</text>\n",
"</g>\n",
"<!-- ElementTimes15613 -->\n",
"<g class=\"node\" id=\"node12\"><title>ElementTimes15613</title>\n",
"<ellipse cx=\"280.412\" cy=\"-401.5\" fill=\"lightgray\" rx=\"14.5\" ry=\"14.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"20.00\" text-anchor=\"middle\" x=\"280.412\" y=\"-396.5\">*</text>\n",
"</g>\n",
"<!-- ElementTimes15613&#45;&gt;Exp15619 -->\n",
"<g class=\"edge\" id=\"edge14\"><title>ElementTimes15613-&gt;Exp15619</title>\n",
"<path d=\"M280.116,-386.925C279.806,-372.656 279.312,-349.919 278.935,-332.592\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"282.424,-332.052 278.708,-322.131 275.426,-332.204 282.424,-332.052\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"341.412\" y=\"-354\">ElementTimes15613_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"341.412\" y=\"-343\">[#](1,)</text>\n",
"</g>\n",
"<!-- ElementTimes15611 -->\n",
"<g class=\"node\" id=\"node13\"><title>ElementTimes15611</title>\n",
"<ellipse cx=\"353.412\" cy=\"-495.5\" fill=\"lightgray\" rx=\"14.5\" ry=\"14.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"20.00\" text-anchor=\"middle\" x=\"353.412\" y=\"-490.5\">*</text>\n",
"</g>\n",
"<!-- ElementTimes15611&#45;&gt;ElementTimes15613 -->\n",
"<g class=\"edge\" id=\"edge15\"><title>ElementTimes15611-&gt;ElementTimes15613</title>\n",
"<path d=\"M339.147,-492.811C322.124,-489.951 294.343,-482.42 281.412,-463 274.279,-452.289 273.744,-437.975 275.125,-425.96\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"278.636,-426.175 276.798,-415.741 271.728,-425.044 278.636,-426.175\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"342.412\" y=\"-455\">ElementTimes15611_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"342.412\" y=\"-444\">[#](1,)</text>\n",
"</g>\n",
"<!-- ElementTimes15611&#45;&gt;ElementTimes15613 -->\n",
"<g class=\"edge\" id=\"edge16\"><title>ElementTimes15611-&gt;ElementTimes15613</title>\n",
"<path d=\"M367.263,-490.071C379.234,-485.422 395.777,-476.833 403.412,-463 408.136,-454.439 409.201,-448.879 403.412,-441 380.971,-410.462 334.388,-403.48 305.428,-402.251\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"305.355,-398.749 295.279,-402.025 305.199,-405.747 305.355,-398.749\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"467.412\" y=\"-455\">ElementTimes15611_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"467.412\" y=\"-444\">[#](1,)</text>\n",
"</g>\n",
"<!-- ElementTimes15609 -->\n",
"<g class=\"node\" id=\"node14\"><title>ElementTimes15609</title>\n",
"<ellipse cx=\"326.412\" cy=\"-582.5\" fill=\"lightgray\" rx=\"14.5\" ry=\"14.5\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"20.00\" text-anchor=\"middle\" x=\"326.412\" y=\"-577.5\">*</text>\n",
"</g>\n",
"<!-- ElementTimes15609&#45;&gt;ElementTimes15611 -->\n",
"<g class=\"edge\" id=\"edge17\"><title>ElementTimes15609-&gt;ElementTimes15611</title>\n",
"<path d=\"M320.811,-569.081C316.618,-557.889 312.528,-541.212 318.412,-528 321.54,-520.976 327.095,-514.86 332.87,-509.935\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"335.374,-512.429 341.233,-503.602 331.148,-506.848 335.374,-512.429\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"379.412\" y=\"-542\">ElementTimes15609_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"379.412\" y=\"-531\">[#](1,)</text>\n",
"</g>\n",
"<!-- ElementTimes15609&#45;&gt;ElementTimes15611 -->\n",
"<g class=\"edge\" id=\"edge18\"><title>ElementTimes15609-&gt;ElementTimes15611</title>\n",
"<path d=\"M340.68,-579.7C368.488,-575.869 428.286,-565.93 440.412,-550 446.334,-542.22 446.027,-536.005 440.412,-528 426.493,-508.158 398.759,-500.763 378.331,-498.035\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"378.456,-494.529 368.147,-496.972 377.729,-501.491 378.456,-494.529\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"505.412\" y=\"-542\">ElementTimes15609_Output_0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"505.412\" y=\"-531\">[#](1,)</text>\n",
"</g>\n",
"<!-- Input15474 -->\n",
"<g class=\"node\" id=\"node15\"><title>Input15474</title>\n",
"<polygon fill=\"lightgray\" points=\"351.045,-655.099 354.112,-655.296 357.148,-655.591 360.14,-655.983 363.076,-656.471 365.943,-657.053 368.731,-657.729 371.429,-658.496 374.027,-659.352 376.515,-660.294 378.883,-661.321 381.124,-662.429 383.231,-663.616 385.196,-664.878 387.014,-666.211 388.679,-667.612 390.188,-669.077 391.538,-670.602 392.726,-672.184 393.75,-673.816 394.61,-675.496 395.306,-677.219 395.839,-678.979 396.211,-680.772 396.424,-682.593 396.481,-684.437 396.387,-686.299 396.145,-688.175 395.76,-690.057 395.237,-691.943 394.583,-693.825 393.804,-695.701 392.905,-697.563 391.893,-699.407 390.775,-701.228 389.558,-703.021 388.249,-704.781 386.854,-706.504 385.379,-708.184 383.833,-709.816 382.22,-711.398 380.548,-712.923 378.821,-714.388 377.047,-715.789 375.229,-717.122 373.373,-718.384 371.484,-719.571 369.565,-720.679 367.622,-721.706 365.657,-722.648 363.673,-723.504 361.674,-724.271 359.662,-724.947 357.64,-725.529 355.609,-726.017 353.572,-726.409 351.529,-726.704 349.484,-726.901 347.436,-727 345.387,-727 343.34,-726.901 341.294,-726.704 339.251,-726.409 337.214,-726.017 335.183,-725.529 333.161,-724.947 331.149,-724.271 329.15,-723.504 327.167,-722.648 325.201,-721.706 323.258,-720.679 321.339,-719.571 319.45,-718.384 317.594,-717.122 315.776,-715.789 314.002,-714.388 312.275,-712.923 310.603,-711.398 308.99,-709.816 307.444,-708.184 305.969,-706.504 304.574,-704.781 303.265,-703.021 302.048,-701.228 300.93,-699.407 299.918,-697.563 299.02,-695.701 298.24,-693.825 297.586,-691.943 297.064,-690.057 296.679,-688.175 296.436,-686.299 296.342,-684.437 296.399,-682.593 296.612,-680.772 296.984,-678.979 297.517,-677.219 298.213,-675.496 299.073,-673.816 300.097,-672.184 301.285,-670.602 302.635,-669.077 304.144,-667.612 305.81,-666.211 307.627,-664.878 309.592,-663.616 311.699,-662.429 313.94,-661.321 316.309,-660.294 318.796,-659.352 321.394,-658.496 324.092,-657.729 326.88,-657.053 329.747,-656.471 332.683,-655.983 335.675,-655.591 338.711,-655.296 341.778,-655.099 344.865,-655 347.958,-655 351.045,-655.099\" stroke=\"black\" stroke-width=\"4\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"346.412\" y=\"-694.4\">Input</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"12.00\" text-anchor=\"middle\" x=\"346.412\" y=\"-681.4\">[#](1,)</text>\n",
"</g>\n",
"<!-- Input15474&#45;&gt;ElementTimes15609 -->\n",
"<g class=\"edge\" id=\"edge19\"><title>Input15474-&gt;ElementTimes15609</title>\n",
"<path d=\"M322.602,-657.986C318.812,-651.356 315.467,-644.191 313.412,-637 310.448,-626.632 312.62,-615.013 315.978,-605.258\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"319.258,-606.482 319.741,-595.898 312.764,-603.871 319.258,-606.482\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"335.912\" y=\"-629\">Input15474</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"335.912\" y=\"-618\">[#](1,)</text>\n",
"</g>\n",
"<!-- Input15474&#45;&gt;ElementTimes15609 -->\n",
"<g class=\"edge\" id=\"edge20\"><title>Input15474-&gt;ElementTimes15609</title>\n",
"<path d=\"M359.594,-655.831C362.703,-642.683 363.76,-627.771 358.412,-615 355.724,-608.583 351.021,-602.811 346.034,-598.015\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"348.089,-595.167 338.214,-591.329 343.54,-600.488 348.089,-595.167\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"384.912\" y=\"-629\">Input15474</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"10.00\" text-anchor=\"middle\" x=\"384.912\" y=\"-618\">[#](1,)</text>\n",
"</g>\n",
"</g>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"C.debugging.set_checked_mode(True)\n",
"z_checked = C.exp(-y(x))*C.exp(y(w))+1\n",
"try:\n",
" z_checked.eval({w:w0, x:x0})\n",
"except:\n",
" exc_type, exc_value, exc_traceback = sys.exc_info()\n",
" error_msg = str(exc_value).split('\\n')[0]\n",
" print(\"Error: %s\"%error_msg)\n",
"display_model(z_checked)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Searching for the name of the operation in the graph, we (unsurprisingly) find that it is the multiplication of the two exponentials that is causing the issue (The number after ElementTimes matches the output name of that node in the graph).\n",
"\n",
"### Guideline 6\n",
"\n",
"> Use `set_checked_mode(True)` to figure out which operation is producing NaNs\n",
"\n",
"This was a contrived example, and in some cases you can get NaNs after many hours of training. Checked mode is introducing a big performance hit so you will need to rerun from your last valid checkpoint. While that is happening on the background inspect your graph for operations that can cause problems such as exponentials that can produce very large numbers.\n",
"\n",
"Finally, the debugging module includes two other ways that can help you find problems with your code\n",
"\n",
"* The `set_computation_network_trace_level()` function takes an integer argument that determines the amount of information that CNTK will produce for each operation in the graph\n",
" - 1: outputs the dimensions and some other static information\n",
" - 1000: outputs for each minibatch the sum of absolute values of all the elements in the output (this can help catch sign and permutation errors) \n",
" - 1000000: outputs for each minibatch all the elements in the output. This is intended to be used as a last resort.\n",
"* The `debug_model()` function takes a CNTK network and returns a new network that has debug operations inserted everywhere. Debug operations can let you inspect the values that flow through the network in the forward and backward direction in an interactive way in the console. This is hard to appeciate through this tutorial, since if we were to run this here it would freeze this notebook (waiting for user input), but you are welcome to check out the [documentation](https://www.cntk.ai/pythondocs/cntk.debugging.debug.html) of how it works and try it out!\n",
"\n",
"### Guideline 7\n",
"\n",
"> Use `debug_model(function)` and `set_computation_network_trace_level(level)` to smoke out any remaining bugs. \n",
"\n",
"\n",
"### Very advanced bugs\n",
"\n",
"Beyond the debugging module, there are a few more internal APIs that can help with certain classes of bugs. All of these internal APIs are in the `cntk.cntk_py` module so when we refer to, say, `force_deterministic_algorithms()` that really means\n",
"`cntk_py.force_deterministic_algorithms()`. The following functions can be useful\n",
"- **`force_deterministic_algorithms()`**: Many of the libraries we use offer various algorithms for performing each operation. Typically the fastest algorithms are non-deterministic because the output is a summation (as in the case of matrix products or convolutions) and multiple threads are working on partial sums that have to be added together. Since addition of floating point numbers is not associative, you can get different results from different executions. force_deterministic_algorithms() will make all subsequent operations select a slower but deterministic algorithm if one is available. This is useful when bitwise reproducibility is important.\n",
"- **`set_gpumemory_allocation_trace_level(level)`**: Sets the trace level for gpu memory allocations. A value greater than 0 will cause the gpu memory allocator to print information about the allocation, the free and total memory, and a call stack of where this allocation was called from. This can be useful in debugging out of memory issues on the GPU.\n",
"- **`enable_synchronous_gpukernel_execution()`**: Makes all gpu kernel launches synchronous. This can help with profiling execution times because the profile of a program with asynchronous execution of gpu kernels can be hard to interpret.\n",
"- **`set_fixed_random_seed(value)`**: All CNTK code goes through a single GenerateRandomSeed API which by default assigns distinct random seeds to each operation that requires randomness (including, random initialization, dropout, and random number generation according to a distribution). With this call all these operations will have the same fixed random seed which can help debug reproducibility issues after you have refactored your program and some parts of the networks are now created in different order. There is still some legacy code that picks the random seed in other ways, so you can still get non-reproducible results with this option. Furthermore, this option reduces the statistical quality of dropout and other random operations in the network and should be used with care.\n",
"- **`disable_forward_values_sharing()`**: CNTK is very aggressive about reusing GPU memory. There are many opportunities both during the forward and the backward pass where a buffer of intermediate results can be reused. Unfortunately, if you write a new operation and do not properly mark which buffers should and should not be reused, you can have very subtle bugs. The backward value sharing is straightforward and you cannot do much to cause CNTK to get it wrong. If you are suspecting such a bug you can see whether disabling forward values (buffers) sharing leads to different results. If so, you need to investigate whether your operation is improperly marking some buffers as possible to share.\n",
"\n",
"\n",
"### Guideline 8\n",
"\n",
"> Use `cntk_py.set_gpumemory_allocation_trace_level(1)` to find out why you are running out of GPU memory.\n",
"\n",
"### Guideline 9\n",
"\n",
"> Use `cntk_py.enable_synchronous_gpukernel_execution()` to make the profiling results easier to understand.\n",
"\n",
"### Guideline 10\n",
"\n",
"> Use `cntk_py.force_deterministic_algorithms()` and `cntk_py.set_fixed_random_seed(seed)` to improve reproducibility.\n",
"\n",
"### Guideline 11\n",
"\n",
"> Use `cntk_py.disable_forward_values_sharing()` if you suspect a memory sharing issue with CNTK."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# For testing purposes, ensure that what the guide says can be executed without failures\n",
"C.debugging.set_computation_network_trace_level(1)\n",
"C.cntk_py.set_gpumemory_allocation_trace_level(1)\n",
"C.cntk_py.enable_synchronous_gpukernel_execution()\n",
"C.cntk_py.force_deterministic_algorithms() \n",
"C.cntk_py.set_fixed_random_seed(98052)\n",
"C.cntk_py.disable_forward_values_sharing()\n",
"dm = C.debugging.debug_model(model_shared_embedding_working) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}