Arrive at your destination safely, comfortably, and well rested. We combine state-of-the-art methods in random search to get you safely to your destination. Using random methods lets us generate efficient routes, and high quality (mandatory) white noise for your journey -- across the town or across the country!
In this experiment an autonomous car will learn to drive up a hill. We'll compare random search (ARS) to Proximal Policy Optimization (PPO).
Before doing anything else, we need to install some libraries.
From the command line, run:
pip install gym
pip install ray
pip install opencv-python
Then for your OS, do:
conda install pytorch torchvision -c pytorch
conda install pytorch torchvision -c pytorch
conda install pytorch -c pytorch
pip3 install torchvision
from ADMCode import visualize as vis
from ADMCode.snuz import run_ppo
from ADMCode.snuz import run_ars
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.simplefilter('ignore', np.RankWarning)
warnings.filterwarnings("ignore", module="matplotlib")
warnings.filterwarnings("ignore")
sns.set(style='white', font_scale=1.3)
%matplotlib inline
%config InlineBackend.figure_format = 'png'
%config InlineBackend.savefig.dpi = 150
We're going to teaching a car to drive up a hill! This is the MountainCarContinuous-v0
from the OpenAI [gym].(https://gym.openai.com)
Let's get driving, uphill! First let's try PPO.
The default hyperparameters are:
gamma = 0.99 # Try me?
lam = 0.98 # Try me?
actor_hidden1 = 64 # Try me?
actor_hidden2 = 64 # Try me?
actor_hidden3 = 64 # Try me?
critic_hidden1 = 64 # Try me?
critic_lr = 0.0003 # Try me? (small changes)
actor_lr = 0.0003 # Try me? (small changes)
batch_size = 64 # Leave me be
l2_rate = 0.001 # Leave me be
clip_param = 0.2 # Leave me be
num_training_epochs = 10 # Try me?
num_episodes = 10 # Try me?
num_memories = 24 # Try me?
num_training_epochs = 10 # Try me?
clip_actions = True # Leave me be
clip_std = 1.0 # Leave me be
seed_value = None # Try me (with int only)
Parameters can be changed by passing to run_ppo
. For example run_ppo(num_episodes=20, actor_lr=0.0006
) doubles the train time and the learning rate of the PPO.
episodes, scores = run_ppo(render=True, num_episodes=10)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-6-6e0a32d91d59> in <module>() ----> 1 episodes, scores = run_ppo(render=True, num_episodes=10) NameError: name 'run_ppo' is not defined
Plot the average reward / episode.
plt.plot(episodes, scores)
plt.xlabel("Episode")
plt.xlabel("Reward")
Compare, say, 10 episodes of PPO to 10 of...
The ARS code was modified from Recht's original source.
The default hyperparameters are:
num_episodes = 10 # Try me?
n_directions = 8 # Try me?
deltas_used = 8 # Try me?
step_size = 0.02 # Try me?
delta_std = 0.03 # Try me?
n_workers = 1 # Leave me be
rollout_length = 240 # Try me?
shift = 0 # Leave me be (all below)
seed = 237
policy_type = 'linear'
dir_path = 'data'
filter = 'MeanStdFilter' # Leave me be
Note: Due to the way the backend of ARS works (it uses a ray, a dist. job system) we can't render exps here. Sorry. :(
episodes, scores = run_ars(num_episodes=10)
plt.plot(episodes, scores)
plt.xlabel("Episode")
plt.xlabel("Reward")