Update CNTK_203_Reinforcement_Learning_Basics.ipynb
Fixed the "unknown character" problems. Fixed the problem which "Bellmann" was not part of the hyperlink.
This commit is contained in:
Родитель
c7c9ee6368
Коммит
99389b3032
|
@ -27,7 +27,7 @@
|
|||
"Q(s,a) &= r_0 + \\gamma r_1 + \\gamma^2 r_2 + \\ldots \\newline\n",
|
||||
"&= r_0 + \\gamma \\max_a Q^*(s',a)\n",
|
||||
"\\end{align}\n",
|
||||
"where $\\gamma \\in [0,1)$ is the discount factor that controls how much we should value reward that is further away. This is called the [$Bellmann$-equation](https://en.wikipedia.org/wiki/Bellman_equation). \n",
|
||||
"where $\\gamma \\in [0,1)$ is the discount factor that controls how much we should value reward that is further away. This is called the [*Bellmann*-equation](https://en.wikipedia.org/wiki/Bellman_equation). \n",
|
||||
"\n",
|
||||
"In this tutorial we will show how to model the state space, how to use the received reward to figure out which action yields the highest future reward. \n",
|
||||
"\n",
|
||||
|
@ -126,7 +126,7 @@
|
|||
"source": [
|
||||
"# Part 1: DQN\n",
|
||||
"\n",
|
||||
"After a transition $(s,a,r,s′)$, we are trying to move our value function $Q(s,a)$ closer to our target $r+\\gamma \\max_{a′}Q(s′,a′)$, where $\\gamma$ is a discount factor for future rewards and ranges in value between 0 and 1.\n",
|
||||
"After a transition $(s,a,r,s')$, we are trying to move our value function $Q(s,a)$ closer to our target $r+\\gamma \\max_{a'}Q(s',a')$, where $\\gamma$ is a discount factor for future rewards and ranges in value between 0 and 1.\n",
|
||||
"\n",
|
||||
"DQNs\n",
|
||||
" * learn the _Q-function_ that maps observation (state, action) to a `score`\n",
|
||||
|
|
Загрузка…
Ссылка в новой задаче