Chapter 1: Introduction

Two characteristics:

  • trial-and-error search
  • delayed reward

Markov decision process:

  1. sensation
  2. action
  3. goal


  • trade-off between exploration and exploitation

main subelements of a reinforcement learning system:

  • policy: the learning agent's way of behaving at a given time.
  • reward signal: goal of the problem.
  • value function: what is good in the long run. 远见. Hard.
  • model of the environemt: (optional) inference about how the environment will behave.

reinforcement learning:

  • VS supervised learning: learning from interaction.
  • VS unsupervised learning: maximize a reward signal, instead of trying to find hidden structure.
  • VS evolutionary methods: its search is guided by value function. more efficient in general.
