Notebook

SNUZ - white noise to drive and relax you!¶

Arrive at your destination safely, comfortably, and well rested. We combine state-of-the-art methods in random search to get you safely to your destination. Using random methods lets us generate efficient routes, and high quality (mandatory) white noise for your journey -- across the town or across the country!

In this experiment an autonomous car will learn to drive up a hill. We'll compare random search (ARS) to Proximal Policy Optimization (PPO).

Aims¶

Install pytorch, et al
Answer the question: does random search do better than a state of the 'cart' RL method in ...one of the simplest continuous control tasks?
Acquirehire.

Install¶

Before doing anything else, we need to install some libraries.

From the command line, run:

pip install gym

pip install ray

pip install opencv-python

Then for your OS, do:

Mac¶

conda install pytorch torchvision -c pytorch

Linux¶

conda install pytorch torchvision -c pytorch

Windows¶

conda install pytorch -c pytorch

pip3 install torchvision

In [ ]:

from ADMCode import visualize as vis
from ADMCode.snuz import run_ppo
from ADMCode.snuz import run_ars
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.simplefilter('ignore', np.RankWarning)
warnings.filterwarnings("ignore", module="matplotlib")
warnings.filterwarnings("ignore")
sns.set(style='white', font_scale=1.3)

%matplotlib inline
%config InlineBackend.figure_format = 'png'
%config InlineBackend.savefig.dpi = 150

Task¶

We're going to teaching a car to drive up a hill! This is the MountainCarContinuous-v0 from the OpenAI [gym].(https://gym.openai.com)

car

Vrooooom!¶

Let's get driving, uphill! First let's try PPO.

PPO¶

The default hyperparameters are:

gamma = 0.99                # Try me?
lam = 0.98                  # Try me?
actor_hidden1 = 64          # Try me?
actor_hidden2 = 64          # Try me?
actor_hidden3 = 64          # Try me?
critic_hidden1 = 64         # Try me?
critic_lr = 0.0003          # Try me? (small changes)
actor_lr = 0.0003           # Try me? (small changes)
batch_size = 64             # Leave me be
l2_rate = 0.001             # Leave me be
clip_param = 0.2            # Leave me be
num_training_epochs = 10    # Try me?
num_episodes = 10           # Try me?
num_memories = 24           # Try me?
num_training_epochs = 10    # Try me?
clip_actions = True         # Leave me be
clip_std = 1.0              # Leave me be
seed_value = None           # Try me (with int only)

Parameters can be changed by passing to run_ppo. For example run_ppo(num_episodes=20, actor_lr=0.0006) doubles the train time and the learning rate of the PPO.

In [6]:

episodes, scores = run_ppo(render=True, num_episodes=10)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-6e0a32d91d59> in <module>()
----> 1 episodes, scores = run_ppo(render=True, num_episodes=10)

NameError: name 'run_ppo' is not defined

Plot the average reward / episode.

In [ ]:

plt.plot(episodes, scores)
plt.xlabel("Episode")
plt.xlabel("Reward")

Compare, say, 10 episodes of PPO to 10 of...

ARS¶

The ARS code was modified from Recht's original source.

The default hyperparameters are:

num_episodes = 10     # Try me?
n_directions = 8      # Try me?
deltas_used = 8       # Try me?
step_size = 0.02      # Try me?
delta_std = 0.03      # Try me?
n_workers = 1         # Leave me be
rollout_length = 240  # Try me?
shift = 0             # Leave me be (all below)
seed = 237
policy_type = 'linear'
dir_path = 'data'
filter = 'MeanStdFilter'  # Leave me be

Note: Due to the way the backend of ARS works (it uses a ray, a dist. job system) we can't render exps here. Sorry. :(

In [ ]:

episodes, scores = run_ars(num_episodes=10)

In [ ]:

plt.plot(episodes, scores)
plt.xlabel("Episode")
plt.xlabel("Reward")