Merge pull request #1039 from tuzzer/patch-2

Fixed formatting errors in CNTK_203_Reinforcement_Learning_Basics.ipynb
2016-11-15 12:20:13 +01:00 · 2016-11-15 12:20:13 +01:00 · 2340b27379
--- a/Tutorials/CNTK_203_Reinforcement_Learning_Basics.ipynb
+++ b/Tutorials/CNTK_203_Reinforcement_Learning_Basics.ipynb
@ -27,7 +27,7 @@
    "Q(s,a) &= r_0 + \\gamma r_1 + \\gamma^2 r_2 + \\ldots \\newline\n",
    "&= r_0 + \\gamma \\max_a Q^*(s',a)\n",
    "\\end{align}\n",
-    "where $\\gamma \\in [0,1)$ is the discount factor that controls how much we should value reward that is further away. This is called the [$Bellmann$-equation](https://en.wikipedia.org/wiki/Bellman_equation). \n",
+    "where $\\gamma \\in [0,1)$ is the discount factor that controls how much we should value reward that is further away. This is called the [*Bellmann*-equation](https://en.wikipedia.org/wiki/Bellman_equation). \n",
    "\n",
    "In this tutorial we will show how to model the state space, how to use the received reward to figure out which action yields the highest future reward. \n",
    "\n",
@ -126,7 +126,7 @@
   "source": [
    "# Part 1: DQN\n",
    "\n",
-    "After a transition $(s,a,r,s′)$, we are trying to move our value function $Q(s,a)$ closer to our target $r+\\gamma \\max_{a′}Q(s′,a′)$, where $\\gamma$ is a discount factor for future rewards and ranges in value between 0 and 1.\n",
+    "After a transition $(s,a,r,s')$, we are trying to move our value function $Q(s,a)$ closer to our target $r+\\gamma \\max_{a'}Q(s',a')$, where $\\gamma$ is a discount factor for future rewards and ranges in value between 0 and 1.\n",
    "\n",
    "DQNs\n",
    " * learn the _Q-function_ that maps observation (state, action) to a `score`\n",