Video: Deepmind's Deep Q-network solving the Atari game Breakout after 600 episodes of self-play (Mnih et. al (2013))
Figure: Top:* AlphaGo's infamous Move 37, a counterintuitive move for a human to make with this board set-up, but nonetheless a gaming-winning one. Bottom: Lee Sedol, the world no.1 Go player, flumexed by AlphaGo's moves. (Silver et al (2017))*
Figure: The complex, interconnected electricity grid
Figure: The RL control loop. Agent's take sequential actions that affect their environment, the environment changes and the agent receives a reward for their action. Adapted from Episode 1 of David Silver's RL Youtube series
More formally, the RL control loop looks something like this:
Figure: The formalised RL control loop.
Generally when we model a function using a black-box approximator, we can select (informally) from two sets of models, these are:
from ipywidgets import interact
import ipywidgets as widgets
def y_linear_plot(w1, w0):
y_lin = w1*x_true+w0
y_err = w1*x+w0
with plt.xkcd():
fig = plt.figure(figsize=(10,5))
plt.xlim([-6,6])
plt.ylim([-4,4])
plt.scatter(x, y, 100, 'k', 'o', zorder=100)
plt.plot(x_true, y_lin)
plt.vlines(x=x, ymin=y_err, ymax=y, colors='green', ls=':', lw=2)
plt.savefig(path+'/images/2_data.png', dpi=300)
plt.show()
interact(y_linear_plot, w1=widgets.FloatSlider(value=0.75, min=-5, max=5, step=0.25),
w0=widgets.FloatSlider(value=-.25, min=-5, max=5, step=0.25))
interactive(children=(FloatSlider(value=0.75, description='w1', max=5.0, min=-5.0, step=0.25), FloatSlider(val…
<function __main__.y_linear_plot(w1, w0)>
def y_poly_plot(w0, w1, w2, w3):
y_poly = w0 + w1*x_true + w2*(x_true**2) + w3*(x_true**3)
y_polyerr = w0 + w1*x + w2*(x**2) + w3*(x**3)
with plt.xkcd():
fig = plt.figure(figsize=(10,5))
plt.xlim([-6,6])
plt.ylim([-4,4])
plt.scatter(x, y, 100, 'k', 'o', zorder=100)
plt.plot(x_true, y_poly)
plt.vlines(x=x, ymin=y_polyerr, ymax=y, colors='green', ls=':', lw=2)
plt.savefig(path+'/images/3_data.png', dpi=300)
plt.show()
interact(y_poly_plot,
w0=widgets.FloatSlider(value=-1, min=-7, max=7, step=0.5),
w1=widgets.FloatSlider(value=-.5, min=-7, max=7, step=0.5),
w2=widgets.FloatSlider(value=1.5, min=-7, max=7, step=0.5),
w3=widgets.FloatSlider(value=1, min=-7, max=7, step=0.5),
)
interactive(children=(FloatSlider(value=-1.0, description='w0', max=7.0, min=-7.0, step=0.5), FloatSlider(valu…
<function __main__.y_poly_plot(w0, w1, w2, w3)>
Figure: Top:* A neural network with 4 nodes and 1 hidden layer–a graph of the model created by hand above. Bottom: A neural network with 10 nodes and 2 hidden layers–a more typical model arrangement.*
We use Bayes' rule to narrow the solution space, which tells us how to update our beliefs about the world based on the evidence we've seen:
$$ P (A | B) = \frac{P(B | A) P(A)}{P(B)} $$and in the setting of functions and data it looks something like:
$$ P (\textbf{f} | \textbf{D}) = \frac{P(\textbf{D} | \textbf{f}) P(\textbf{f})}{P(\textbf{D})} $$# plot rbf kernel
with plt.xkcd():
fig = plt.figure(figsize=(10,5))
plt.scatter(x, y, 100, 'k', 'o', zorder=100)
plt.plot(x_true, f_rbf.T)
plt.xlim([-6,6])
plt.ylim([-4,4])
plt.savefig(path+'/images/8_data.png', dpi=300)
plt.show()
Further explanations on why these techniques work
How can you build these in practice?
Introductory Books