In [3]:
# %load /Users/facai/Study/book_notes/
%matplotlib inline

import matplotlib.pyplot as plt
import seaborn as sns
sns.set(font='SimHei', font_scale=2.5)
plt.rcParams['axes.grid'] = False

def show_image(filename, figsize=None, res_dir=True):
    if figsize:

    if res_dir:
        filename = './res/{}'.format(filename)


Chapter 1: Introduction

Two characteristics:

  • trial-and-error search
  • delayed reward

Markov decision process:

  1. sensation
  2. action
  3. goal


  • trade-off between exploration and exploitation

main subelements of a reinforcement learning system:

  • policy: the learning agent's way of behaving at a given time.
  • reward signal: goal of the problem.
  • value function: what is good in the long run. 远见. Hard.
  • model of the environemt: (optional) inference about how the environment will behave.

reinforcement learning:

  • VS supervised learning: learning from interaction.
  • VS unsupervised learning: maximize a reward signal, instead of trying to find hidden structure.
  • VS evolutionary methods: its search is guided by value function. more efficient in general.
In [6]:
show_image('fig1_1.png', figsize=(12, 8))
In [ ]: