Update CNTK_203_Reinforcement_Learning_Basics.ipynb

Fixed the "unknown character" problems. 
Fixed the problem which "Bellmann" was not part of the hyperlink.
This commit is contained in:
Matthew Chan 2016-11-14 22:23:40 -08:00 коммит произвёл GitHub
Родитель c7c9ee6368
Коммит 99389b3032
1 изменённых файлов: 2 добавлений и 2 удалений

Просмотреть файл

@ -27,7 +27,7 @@
"Q(s,a) &= r_0 + \\gamma r_1 + \\gamma^2 r_2 + \\ldots \\newline\n",
"&= r_0 + \\gamma \\max_a Q^*(s',a)\n",
"\\end{align}\n",
"where $\\gamma \\in [0,1)$ is the discount factor that controls how much we should value reward that is further away. This is called the [$Bellmann$-equation](https://en.wikipedia.org/wiki/Bellman_equation). \n",
"where $\\gamma \\in [0,1)$ is the discount factor that controls how much we should value reward that is further away. This is called the [*Bellmann*-equation](https://en.wikipedia.org/wiki/Bellman_equation). \n",
"\n",
"In this tutorial we will show how to model the state space, how to use the received reward to figure out which action yields the highest future reward. \n",
"\n",
@ -126,7 +126,7 @@
"source": [
"# Part 1: DQN\n",
"\n",
"After a transition $(s,a,r,s)$, we are trying to move our value function $Q(s,a)$ closer to our target $r+\\gamma \\max_{a}Q(s,a)$, where $\\gamma$ is a discount factor for future rewards and ranges in value between 0 and 1.\n",
"After a transition $(s,a,r,s')$, we are trying to move our value function $Q(s,a)$ closer to our target $r+\\gamma \\max_{a'}Q(s',a')$, where $\\gamma$ is a discount factor for future rewards and ranges in value between 0 and 1.\n",
"\n",
"DQNs\n",
" * learn the _Q-function_ that maps observation (state, action) to a `score`\n",