%%capture
! pip install "ray[rllib, serve, tune]==2.2.0"
! pip install "pyarrow==10.0.0"
! pip install "tensorflow>=2.9.0"
! pip install "transformers>=4.24.0"
! pip install "pygame==2.1.2" "gym==0.25.0"
import ray
ray.init()
2023-03-18 08:06:40,198 INFO worker.py:1529 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
Python version: | 3.9.16 |
Ray version: | 2.2.0 |
Dashboard: | http://127.0.0.1:8265 |
The following simple example creates a distributed Dataset
on your local Ray Cluster from a Python data structure. Specifically, you’ll create a dataset from a Python dictionary containing a string name
and an integer-valued data
for 10,000 entries:
items = [{"name": str(i), "data": i} for i in range(10000)]
ds = ray.data.from_items(items)
ds.show(5)
{'name': '0', 'data': 0} {'name': '1', 'data': 1} {'name': '2', 'data': 2} {'name': '3', 'data': 3} {'name': '4', 'data': 4}
Great, now you have some rows, but what can you do with that data? The Dataset
API bets heavily on functional programming, as this paradigm is well suited for data transformations.
Even though Python 3 made a point of hiding some of its functional programming capabilities, you’re probably familiar with functionality such as map
, filter
, flat_map
, and others. If not, it’s easy enough to pick up: map
takes each element of your dataset and transforms it into something else, in parallel; filter
removes data points according to a Boolean filter function; and the slightly more elaborate flat_map
first maps values similarly to map
, but then it also “flattens” the result. For instance, if map
produced a list of lists, flat_map
would flatten out the nested lists and give you just a list. Equipped with these three functional API calls, let’s see how easily you can transform your dataset ds
:
#We map each row of ds to only keep the square value of its data entry.
squares = ds.map(lambda x: x["data"] ** 2)
#Then we filter the squares to keep only even numbers (a total of five thousand elements).
evens = squares.filter(lambda x: x % 2 == 0)
evens.count()
#We then use flat_map to augment the remaining values with their respective cubes.
cubes = evens.flat_map(lambda x: [x, x**3])
#To take a total of 10 values means to leave Ray and return a Python list with
#these values that we can print.
sample = cubes.take(10)
print(sample)
2023-03-18 08:17:43,549 WARNING dataset.py:4233 -- The `map`, `flat_map`, and `filter` operations are unvectorized and can be very slow. Consider using `.map_batches()` instead. Map: 100%|██████████| 200/200 [00:02<00:00, 78.04it/s] Filter: 100%|██████████| 200/200 [00:00<00:00, 403.82it/s] Flat_Map: 100%|██████████| 200/200 [00:00<00:00, 329.68it/s]
[0, 0, 4, 64, 16, 4096, 36, 46656, 64, 262144]
The drawback of Dataset
transformations is that each step gets executed synchronously. In this example that is a nonissue, but for complex tasks that, for example, mix reading files and processing data, you would want an execution that can overlap individual tasks. DatasetPipeline
does exactly that. Let’s rewrite the previous example into a pipeline:
#You can turn a Dataset into a pipeline by calling .window() on it.
pipe = ds.window()
#Pipeline steps can be chained to yield the same result as before.
result = pipe\
.map(lambda x: x["data"] ** 2)\
.filter(lambda x: x % 2 == 0)\
.flat_map(lambda x: [x, x**3])
result.show(10)
2023-03-18 08:20:49,252 INFO dataset.py:3693 -- Created DatasetPipeline with 20 windows: 7390b min, 8000b max, 7944b mean 2023-03-18 08:20:49,255 INFO dataset.py:3703 -- Blocks per window: 10 min, 10 max, 10 mean 2023-03-18 08:20:49,262 INFO dataset.py:3725 -- ✔️ This pipeline's per-window parallelism is high enough to fully utilize the cluster. 2023-03-18 08:20:49,266 INFO dataset.py:3742 -- ✔️ This pipeline's windows likely fit in object store memory without spilling. Stage 0: 0%| | 0/20 [00:00<?, ?it/s] 0%| | 0/20 [00:00<?, ?it/s] Stage 1: 0%| | 0/20 [00:00<?, ?it/s] Stage 1: 5%|▌ | 1/20 [00:00<00:03, 5.80it/s] Stage 0: 10%|█ | 2/20 [00:00<00:01, 10.96it/s]
0 0 4 64 16 4096 36 46656 64 262144
Moving on to the next set of libraries, let’s look at the distributed training capabilities of Ray. For that, you have access to two libraries. One is dedicated to reinforcement learning specifically; the other one has a different scope and is aimed primarily at supervised learning tasks.
Let’s start with Ray RLlib for reinforcement learning (RL). This library is powered by the modern ML frameworks TensorFlow and PyTorch, and you can choose which one to use. Both frameworks seem to converge more and more conceptually, so you can pick the one you like most without losing much in the process.
One of the easiest ways to run examples with RLlib is to use the command-line tool rllib
, which we already installed implicitly when we ran pip install "ray[rllib]"
.
We’ll look at a fairly classic control problem of balancing a pole on a cart. Imagine you have a pole like the one in figure below, fixed at a joint of a cart, and subject to gravity. The cart is free to move along a frictionless track, and you can manipulate the cart by giving it a push from the left or the right with a fixed force. If you do this well enough, the pole will remain in an upright position. For each time step the pole didn’t fall over, we get a reward of 1. Collecting a high reward is our goal, and the question is whether we can teach a reinforcement learning algorithm to do this for us.
Specifically, we want to train a reinforcement learning agent that can carry out two actions, namely, push to the left or to the right, observe what happens when interacting with the environment in that way, and learn from the experience by maximizing the reward.
To tackle this problem with Ray RLlib, we can use a so-called tuned example, which is a preconfigured algorithm that runs well for a given problem. You can run a tuned example with a single command. RLlib comes with many such examples, and you can list them all with rllib example list
.
! rllib example list
RLlib Examples ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Example ID ┃ Description ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ atari-a2c │ Runs grid search over several Atari games │ │ │ on A2C. │ │ atari-dqn │ Run grid search on Atari environments with │ │ │ DQN. │ │ atari-duel-ddqn │ Run grid search on Atari environments with │ │ │ duelling double DQN. │ │ atari-impala │ Run grid search over several atari games │ │ │ with IMPALA. │ │ atari-ppo │ Run grid search over several atari games │ │ │ with PPO. │ │ atari-sac │ Run grid search on several atari games │ │ │ with SAC. │ │ breakout-apex-dqn │ Runs Apex DQN on BreakoutNoFrameskip-v4. │ │ breakout-ddppo │ Runs DDPPO on BreakoutNoFrameskip-v4. │ │ cartpole-a2c │ Runs A2C on the CartPole-v1 environment. │ │ cartpole-a2c-micro │ Runs A2C on the CartPole-v1 environment, │ │ │ using micro-batches. │ │ cartpole-a3c │ Runs A3C on the CartPole-v1 environment. │ │ cartpole-alpha-zero │ Runs AlphaZero on a Cartpole with sparse │ │ │ rewards. │ │ cartpole-apex-dqn │ Runs Apex DQN on CartPole-v1. │ │ cartpole-appo │ Runs APPO on CartPole-v1. │ │ cartpole-ars │ Runs ARS on CartPole-v1. │ │ cartpole-bc │ Runs BC on CartPole-v1. │ │ cartpole-crr │ Run CRR on CartPole-v1. │ │ cartpole-ddppo │ Runs DDPPO on CartPole-v1 │ │ cartpole-dqn │ Run DQN on CartPole-v1. │ │ cartpole-dt │ Run DT on CartPole-v1. │ │ cartpole-es │ Run ES on CartPole-v1. │ │ cartpole-impala │ Run IMPALA on CartPole-v1. │ │ cartpole-maml │ Run MAML on CartPole-v1. │ │ cartpole-marwil │ Run MARWIL on CartPole-v1. │ │ cartpole-mbmpo │ Run MBMPO on a CartPole environment │ │ │ wrapper. │ │ cartpole-pg │ Run PG on CartPole-v1 │ │ cartpole-ppo │ Run PPO on CartPole-v1. │ │ cartpole-sac │ Run SAC on CartPole-v1 │ │ cartpole-simpleq │ Run SimpleQ on CartPole-v1 │ │ dm-control-dreamer │ Run DREAMER on a suite of control problems │ │ │ by Deepmind. │ │ frozenlake-appo │ Runs APPO on FrozenLake-v1. │ │ halfcheetah-appo │ Runs APPO on HalfCheetah-v2. │ │ halfcheetah-bullet-ddpg │ Runs DDPG on HalfCheetahBulletEnv-v0. │ │ halfcheetah-cql │ Runs grid search on HalfCheetah │ │ │ environments with CQL. │ │ halfcheetah-ddpg │ Runs DDPG on HalfCheetah-v2. │ │ halfcheetah-maml │ Run MAML on a custom HalfCheetah │ │ │ environment. │ │ halfcheetah-mbmpo │ Run MBMPO on a HalfCheetah environment │ │ │ wrapper. │ │ halfcheetah-ppo │ Run PPO on HalfCheetah-v2. │ │ halfcheetah-sac │ Run SAC on HalfCheetah-v3. │ │ hopper-bullet-ddpg │ Runs DDPG on HopperBulletEnv-v0. │ │ hopper-cql │ Runs grid search on Hopper environments │ │ │ with CQL. │ │ hopper-mbmpo │ Run MBMPO on a Hopper environment wrapper. │ │ hopper-ppo │ Run PPO on Hopper-v1. │ │ humanoid-es │ Run ES on Humanoid-v2. │ │ humanoid-ppo │ Run PPO on Humanoid-v1. │ │ inverted-pendulum-td3 │ Run TD3 on InvertedPendulum-v2. │ │ mountaincar-apex-ddpg │ Runs Apex DDPG on │ │ │ MountainCarContinuous-v0. │ │ mountaincar-ddpg │ Runs DDPG on MountainCarContinuous-v0. │ │ mujoco-td3 │ Run TD3 against four of the hardest MuJoCo │ │ │ tasks. │ │ multi-agent-cartpole-alpha-star │ Runs AlphaStar on 4 CartPole agents. │ │ multi-agent-cartpole-appo │ Runs APPO on RLlib's MultiAgentCartPole │ │ multi-agent-cartpole-impala │ Run IMPALA on RLlib's MultiAgentCartPole │ │ pacman-sac │ Run SAC on MsPacmanNoFrameskip-v4. │ │ pendulum-apex-ddpg │ Runs Apex DDPG on Pendulum-v1. │ │ pendulum-appo │ Runs APPO on Pendulum-v1. │ │ pendulum-cql │ Runs CQL on Pendulum-v1. │ │ pendulum-crr │ Run CRR on Pendulum-v1. │ │ pendulum-ddpg │ Runs DDPG on Pendulum-v1. │ │ pendulum-ddppo │ Runs DDPPO on Pendulum-v1. │ │ pendulum-dt │ Run DT on Pendulum-v1. │ │ pendulum-impala │ Run IMPALA on Pendulum-v1. │ │ pendulum-maml │ Run MAML on a custom Pendulum environment. │ │ pendulum-mbmpo │ Run MBMPO on a Pendulum environment │ │ │ wrapper. │ │ pendulum-ppo │ Run PPO on Pendulum-v1. │ │ pendulum-sac │ Run SAC on Pendulum-v1. │ │ pendulum-td3 │ Run TD3 on Pendulum-v1. │ │ pong-a3c │ Runs A3C on the PongDeterministic-v4 │ │ │ environment. │ │ pong-apex-dqn │ Runs Apex DQN on PongNoFrameskip-v4. │ │ pong-appo │ Runs APPO on PongNoFrameskip-v4. │ │ pong-dqn │ Run DQN on PongDeterministic-v4. │ │ pong-impala │ Run IMPALA on PongNoFrameskip-v4. │ │ pong-ppo │ Run PPO on PongNoFrameskip-v4. │ │ pong-rainbow │ Run Rainbow on PongDeterministic-v4. │ │ recsys-bandits │ Runs BanditLinUCB on a Recommendation │ │ │ Simulation environment. │ │ recsys-long-term-slateq │ Run SlateQ on a recommendation system │ │ │ aimed at long-term satisfaction. │ │ recsys-parametric-slateq │ SlateQ run on a recommendation system. │ │ recsys-ppo │ Run PPO on a recommender system example │ │ │ from RLlib. │ │ recsys-slateq │ SlateQ run on a recommendation system. │ │ repeatafterme-ppo │ Run PPO on RLlib's RepeatAfterMe │ │ │ environment. │ │ stateless-cartpole-r2d2 │ Run R2D2 on a stateless cart pole │ │ │ environment. │ │ swimmer-ars │ Runs ARS on Swimmer-v2. │ │ two-step-game-maddpg │ Run RLlib's Two-step game with multi-agent │ │ │ DDPG. │ │ two-step-game-qmix │ Run QMIX on RLlib's two-step game. │ │ walker2d-ppo │ Run PPO on the Walker2d-v1 environment. │ └─────────────────────────────────┴────────────────────────────────────────────┘ Run any RLlib example as using 'rllib example run <Example ID>'.See 'rllib example run --help' for more information.
One of the available examples is cartpole-ppo
, a tuned example that uses the PPO algorithm to solve the cart–pole problem, specifically, the CartPole-v1
environment from OpenAI Gym.
cartpole-ppo:
env: CartPole-v1 [1]
run: PPO [2]
stop:
episode_reward_mean: 150 [3]
timesteps_total: 100000
config: [4]
framework: tf
gamma: 0.99
lr: 0.0003
num_workers: 1
observation_filter: MeanStdFilter
num_sgd_iter: 6
vf_loss_coeff: 0.01
model:
fcnet_hiddens: [32]
fcnet_activation: linear
vf_share_layers: true
enable_connectors: True
CartPole-v1
environment simulates the problem we just described.The details of this configuration file don’t matter much at this point, so don’t get distracted by them. The important part is that you specify the Cartpole-v1
environment and sufficient RL-specific configuration to ensure the training procedure works. Running this configuration doesn’t require any special hardware and finishes in a matter of minutes.
! rllib example run cartpole-ppo
== Status == Current time: 2023-03-18 08:34:37 (running for 00:03:28.54) Memory usage on this node: 2.9/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2.0/2 CPUs, 0/0 GPUs, 0.0/7.36 GiB heap, 0.0/3.68 GiB objects Result logdir: /root/ray_results/cartpole-ppo Number of trials: 1/1 (1 RUNNING) +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------| | PPO_CartPole-v1_3bb16_00000 | RUNNING | 172.28.0.12:7624 | 5 | 168.401 | 20000 | 108.29 | 500 | 13 | 108.29 | +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ == Status == Current time: 2023-03-18 08:34:42 (running for 00:03:33.55) Memory usage on this node: 2.9/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2.0/2 CPUs, 0/0 GPUs, 0.0/7.36 GiB heap, 0.0/3.68 GiB objects Result logdir: /root/ray_results/cartpole-ppo Number of trials: 1/1 (1 RUNNING) +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------| | PPO_CartPole-v1_3bb16_00000 | RUNNING | 172.28.0.12:7624 | 5 | 168.401 | 20000 | 108.29 | 500 | 13 | 108.29 | +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ == Status == Current time: 2023-03-18 08:34:47 (running for 00:03:38.55) Memory usage on this node: 2.9/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2.0/2 CPUs, 0/0 GPUs, 0.0/7.36 GiB heap, 0.0/3.68 GiB objects Result logdir: /root/ray_results/cartpole-ppo Number of trials: 1/1 (1 RUNNING) +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------| | PPO_CartPole-v1_3bb16_00000 | RUNNING | 172.28.0.12:7624 | 5 | 168.401 | 20000 | 108.29 | 500 | 13 | 108.29 | +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ Result for PPO_CartPole-v1_3bb16_00000: agent_timesteps_total: 24000 counters: num_agent_steps_sampled: 24000 num_agent_steps_trained: 24000 num_env_steps_sampled: 24000 num_env_steps_trained: 24000 custom_metrics: {} date: 2023-03-18_08-34-48 done: false episode_len_mean: 142.47 episode_media: {} episode_reward_max: 500.0 episode_reward_mean: 142.47 episode_reward_min: 13.0 episodes_this_iter: 12 episodes_total: 429 experiment_id: db1de6e2783647b49113496fff88c803 hostname: 0738217da70e info: learner: default_policy: custom_metrics: {} diff_num_grad_updates_vs_sampler_policy: 95.5 learner_stats: cur_kl_coeff: 0.05000000074505806 cur_lr: 0.0003000000142492354 entropy: 0.5886597633361816 entropy_coeff: 0.0 kl: 0.0035236419644206762 policy_loss: -0.002459704177454114 total_loss: 0.09718986600637436 vf_explained_var: 0.0019282136345282197 vf_loss: 9.947339057922363 num_agent_steps_trained: 125.0 num_grad_updates_lifetime: 1056.5 num_agent_steps_sampled: 24000 num_agent_steps_trained: 24000 num_env_steps_sampled: 24000 num_env_steps_trained: 24000 iterations_since_restore: 6 node_ip: 172.28.0.12 num_agent_steps_sampled: 24000 num_agent_steps_trained: 24000 num_env_steps_sampled: 24000 num_env_steps_sampled_this_iter: 4000 num_env_steps_trained: 24000 num_env_steps_trained_this_iter: 4000 num_faulty_episodes: 0 num_healthy_workers: 1 num_in_flight_async_reqs: 0 num_remote_worker_restarts: 0 num_steps_trained_this_iter: 4000 perf: cpu_util_percent: 73.82000000000001 ram_util_percent: 23.102222222222224 pid: 7624 policy_reward_max: {} policy_reward_mean: {} policy_reward_min: {} sampler_perf: mean_action_processing_ms: 0.15060454694754802 mean_env_render_ms: 0.0 mean_env_wait_ms: 0.1447757987134971 mean_inference_ms: 4.981463969697947 mean_raw_obs_processing_ms: 0.9265207062667011 sampler_results: custom_metrics: {} episode_len_mean: 142.47 episode_media: {} episode_reward_max: 500.0 episode_reward_mean: 142.47 episode_reward_min: 13.0 episodes_this_iter: 12 hist_stats: episode_lengths: [99, 13, 16, 58, 74, 14, 71, 48, 162, 37, 67, 13, 152, 24, 34, 61, 140, 13, 24, 25, 77, 87, 60, 39, 29, 21, 30, 125, 14, 18, 147, 71, 14, 123, 20, 169, 18, 57, 235, 23, 134, 92, 94, 127, 225, 139, 187, 174, 163, 101, 39, 97, 65, 140, 41, 35, 17, 65, 142, 55, 169, 275, 315, 33, 155, 139, 151, 73, 52, 183, 65, 305, 274, 500, 83, 231, 191, 144, 248, 267, 363, 37, 162, 159, 500, 92, 138, 282, 286, 212, 56, 219, 452, 329, 500, 232, 398, 332, 491, 500] episode_reward: [99.0, 13.0, 16.0, 58.0, 74.0, 14.0, 71.0, 48.0, 162.0, 37.0, 67.0, 13.0, 152.0, 24.0, 34.0, 61.0, 140.0, 13.0, 24.0, 25.0, 77.0, 87.0, 60.0, 39.0, 29.0, 21.0, 30.0, 125.0, 14.0, 18.0, 147.0, 71.0, 14.0, 123.0, 20.0, 169.0, 18.0, 57.0, 235.0, 23.0, 134.0, 92.0, 94.0, 127.0, 225.0, 139.0, 187.0, 174.0, 163.0, 101.0, 39.0, 97.0, 65.0, 140.0, 41.0, 35.0, 17.0, 65.0, 142.0, 55.0, 169.0, 275.0, 315.0, 33.0, 155.0, 139.0, 151.0, 73.0, 52.0, 183.0, 65.0, 305.0, 274.0, 500.0, 83.0, 231.0, 191.0, 144.0, 248.0, 267.0, 363.0, 37.0, 162.0, 159.0, 500.0, 92.0, 138.0, 282.0, 286.0, 212.0, 56.0, 219.0, 452.0, 329.0, 500.0, 232.0, 398.0, 332.0, 491.0, 500.0] num_faulty_episodes: 0 policy_reward_max: {} policy_reward_mean: {} policy_reward_min: {} sampler_perf: mean_action_processing_ms: 0.15060454694754802 mean_env_render_ms: 0.0 mean_env_wait_ms: 0.1447757987134971 mean_inference_ms: 4.981463969697947 mean_raw_obs_processing_ms: 0.9265207062667011 time_since_restore: 199.58355569839478 time_this_iter_s: 31.182859659194946 time_total_s: 199.58355569839478 timers: learn_throughput: 456.437 learn_time_ms: 8763.54 synch_weights_time_ms: 5.189 training_iteration_time_ms: 33240.144 timestamp: 1679128488 timesteps_since_restore: 0 timesteps_total: 24000 training_iteration: 6 trial_id: 3bb16_00000 warmup_time: 11.70189356803894 (PPO pid=7624) 2023-03-18 08:34:48,893 INFO filter_manager.py:34 -- Synchronizing filters ... (PPO pid=7624) 2023-03-18 08:34:48,899 INFO filter_manager.py:55 -- Updating remote filters ... == Status == Current time: 2023-03-18 08:34:54 (running for 00:03:44.79) Memory usage on this node: 2.9/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2.0/2 CPUs, 0/0 GPUs, 0.0/7.36 GiB heap, 0.0/3.68 GiB objects Result logdir: /root/ray_results/cartpole-ppo Number of trials: 1/1 (1 RUNNING) +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------| | PPO_CartPole-v1_3bb16_00000 | RUNNING | 172.28.0.12:7624 | 6 | 199.584 | 24000 | 142.47 | 500 | 13 | 142.47 | +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ == Status == Current time: 2023-03-18 08:34:59 (running for 00:03:49.80) Memory usage on this node: 2.9/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2.0/2 CPUs, 0/0 GPUs, 0.0/7.36 GiB heap, 0.0/3.68 GiB objects Result logdir: /root/ray_results/cartpole-ppo Number of trials: 1/1 (1 RUNNING) +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------| | PPO_CartPole-v1_3bb16_00000 | RUNNING | 172.28.0.12:7624 | 6 | 199.584 | 24000 | 142.47 | 500 | 13 | 142.47 | +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ == Status == Current time: 2023-03-18 08:35:04 (running for 00:03:54.80) Memory usage on this node: 2.9/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2.0/2 CPUs, 0/0 GPUs, 0.0/7.36 GiB heap, 0.0/3.68 GiB objects Result logdir: /root/ray_results/cartpole-ppo Number of trials: 1/1 (1 RUNNING) +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------| | PPO_CartPole-v1_3bb16_00000 | RUNNING | 172.28.0.12:7624 | 6 | 199.584 | 24000 | 142.47 | 500 | 13 | 142.47 | +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ == Status == Current time: 2023-03-18 08:35:09 (running for 00:03:59.81) Memory usage on this node: 2.9/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2.0/2 CPUs, 0/0 GPUs, 0.0/7.36 GiB heap, 0.0/3.68 GiB objects Result logdir: /root/ray_results/cartpole-ppo Number of trials: 1/1 (1 RUNNING) +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------| | PPO_CartPole-v1_3bb16_00000 | RUNNING | 172.28.0.12:7624 | 6 | 199.584 | 24000 | 142.47 | 500 | 13 | 142.47 | +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ == Status == Current time: 2023-03-18 08:35:14 (running for 00:04:04.82) Memory usage on this node: 2.9/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2.0/2 CPUs, 0/0 GPUs, 0.0/7.36 GiB heap, 0.0/3.68 GiB objects Result logdir: /root/ray_results/cartpole-ppo Number of trials: 1/1 (1 RUNNING) +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------| | PPO_CartPole-v1_3bb16_00000 | RUNNING | 172.28.0.12:7624 | 6 | 199.584 | 24000 | 142.47 | 500 | 13 | 142.47 | +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ == Status == Current time: 2023-03-18 08:35:19 (running for 00:04:09.82) Memory usage on this node: 2.9/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2.0/2 CPUs, 0/0 GPUs, 0.0/7.36 GiB heap, 0.0/3.68 GiB objects Result logdir: /root/ray_results/cartpole-ppo Number of trials: 1/1 (1 RUNNING) +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------| | PPO_CartPole-v1_3bb16_00000 | RUNNING | 172.28.0.12:7624 | 6 | 199.584 | 24000 | 142.47 | 500 | 13 | 142.47 | +-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ Result for PPO_CartPole-v1_3bb16_00000: agent_timesteps_total: 28000 counters: num_agent_steps_sampled: 28000 num_agent_steps_trained: 28000 num_env_steps_sampled: 28000 num_env_steps_trained: 28000 custom_metrics: {} date: 2023-03-18_08-35-21 done: true episode_len_mean: 178.85 episode_media: {} episode_reward_max: 500.0 episode_reward_mean: 178.85 episode_reward_min: 13.0 episodes_this_iter: 9 episodes_total: 438 experiment_id: db1de6e2783647b49113496fff88c803 hostname: 0738217da70e info: learner: default_policy: custom_metrics: {} diff_num_grad_updates_vs_sampler_policy: 95.5 learner_stats: cur_kl_coeff: 0.02500000037252903 cur_lr: 0.0003000000142492354 entropy: 0.5833122730255127 entropy_coeff: 0.0 kl: 0.003216453595086932 policy_loss: -0.0005987154436297715 total_loss: 0.09903717041015625 vf_explained_var: -0.0003911629319190979 vf_loss: 9.955548286437988 num_agent_steps_trained: 125.0 num_grad_updates_lifetime: 1248.5 num_agent_steps_sampled: 28000 num_agent_steps_trained: 28000 num_env_steps_sampled: 28000 num_env_steps_trained: 28000 iterations_since_restore: 7 node_ip: 172.28.0.12 num_agent_steps_sampled: 28000 num_agent_steps_trained: 28000 num_env_steps_sampled: 28000 num_env_steps_sampled_this_iter: 4000 num_env_steps_trained: 28000 num_env_steps_trained_this_iter: 4000 num_faulty_episodes: 0 num_healthy_workers: 1 num_in_flight_async_reqs: 0 num_remote_worker_restarts: 0 num_steps_trained_this_iter: 4000 perf: cpu_util_percent: 78.09347826086956 ram_util_percent: 23.099999999999998 pid: 7624 policy_reward_max: {} policy_reward_mean: {} policy_reward_min: {} sampler_perf: mean_action_processing_ms: 0.15010584224508902 mean_env_render_ms: 0.0 mean_env_wait_ms: 0.14444693410492337 mean_inference_ms: 4.976027958863329 mean_raw_obs_processing_ms: 0.9240927234928833 sampler_results: custom_metrics: {} episode_len_mean: 178.85 episode_media: {} episode_reward_max: 500.0 episode_reward_mean: 178.85 episode_reward_min: 13.0 episodes_this_iter: 9 hist_stats: episode_lengths: [37, 67, 13, 152, 24, 34, 61, 140, 13, 24, 25, 77, 87, 60, 39, 29, 21, 30, 125, 14, 18, 147, 71, 14, 123, 20, 169, 18, 57, 235, 23, 134, 92, 94, 127, 225, 139, 187, 174, 163, 101, 39, 97, 65, 140, 41, 35, 17, 65, 142, 55, 169, 275, 315, 33, 155, 139, 151, 73, 52, 183, 65, 305, 274, 500, 83, 231, 191, 144, 248, 267, 363, 37, 162, 159, 500, 92, 138, 282, 286, 212, 56, 219, 452, 329, 500, 232, 398, 332, 491, 500, 424, 500, 500, 500, 500, 500, 500, 500, 269] episode_reward: [37.0, 67.0, 13.0, 152.0, 24.0, 34.0, 61.0, 140.0, 13.0, 24.0, 25.0, 77.0, 87.0, 60.0, 39.0, 29.0, 21.0, 30.0, 125.0, 14.0, 18.0, 147.0, 71.0, 14.0, 123.0, 20.0, 169.0, 18.0, 57.0, 235.0, 23.0, 134.0, 92.0, 94.0, 127.0, 225.0, 139.0, 187.0, 174.0, 163.0, 101.0, 39.0, 97.0, 65.0, 140.0, 41.0, 35.0, 17.0, 65.0, 142.0, 55.0, 169.0, 275.0, 315.0, 33.0, 155.0, 139.0, 151.0, 73.0, 52.0, 183.0, 65.0, 305.0, 274.0, 500.0, 83.0, 231.0, 191.0, 144.0, 248.0, 267.0, 363.0, 37.0, 162.0, 159.0, 500.0, 92.0, 138.0, 282.0, 286.0, 212.0, 56.0, 219.0, 452.0, 329.0, 500.0, 232.0, 398.0, 332.0, 491.0, 500.0, 424.0, 500.0, 500.0, 500.0, 500.0, 500.0, 500.0, 500.0, 269.0] num_faulty_episodes: 0 policy_reward_max: {} policy_reward_mean: {} policy_reward_min: {} sampler_perf: mean_action_processing_ms: 0.15010584224508902 mean_env_render_ms: 0.0 mean_env_wait_ms: 0.14444693410492337 mean_inference_ms: 4.976027958863329 mean_raw_obs_processing_ms: 0.9240927234928833 time_since_restore: 232.44457721710205 time_this_iter_s: 32.861021518707275 time_total_s: 232.44457721710205 timers: learn_throughput: 470.055 learn_time_ms: 8509.634 synch_weights_time_ms: 5.025 training_iteration_time_ms: 33183.327 timestamp: 1679128521 timesteps_since_restore: 0 timesteps_total: 28000 training_iteration: 7 trial_id: 3bb16_00000 warmup_time: 11.70189356803894 (PPO pid=7624) 2023-03-18 08:35:21,839 INFO filter_manager.py:34 -- Synchronizing filters ... (PPO pid=7624) 2023-03-18 08:35:21,848 INFO filter_manager.py:55 -- Updating remote filters ... == Status == Current time: 2023-03-18 08:35:21 (running for 00:04:12.69) Memory usage on this node: 2.9/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 0/2 CPUs, 0/0 GPUs, 0.0/7.36 GiB heap, 0.0/3.68 GiB objects Result logdir: /root/ray_results/cartpole-ppo Number of trials: 1/1 (1 TERMINATED) +-----------------------------+------------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-----------------------------+------------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------| | PPO_CartPole-v1_3bb16_00000 | TERMINATED | 172.28.0.12:7624 | 7 | 232.445 | 28000 | 178.85 | 500 | 13 | 178.85 | +-----------------------------+------------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ 2023-03-18 08:35:22,135 INFO tune.py:762 -- Total run time: 252.92 seconds (252.67 seconds for the tuning loop). Your training finished. Best available checkpoint for each trial: /root/ray_results/cartpole-ppo/PPO_CartPole-v1_3bb16_00000_0_2023-03-18_08-31- 09/checkpoint_000007 You can now evaluate your trained algorithm from any checkpoint, e.g. by running: ╭──────────────────────────────────────────────────────────────────────────────╮ │ rllib evaluate │ │ /root/ray_results/cartpole-ppo/PPO_CartPole-v1_3bb16_00000_0_2023-03-18_08-3 │ │ 1-09/checkpoint_000007 --algo PPO │ ╰──────────────────────────────────────────────────────────────────────────────╯
Your local Ray checkpoint folder is ~/ray-results by default. For the training configuration we used, your
To evaluate the performance of your trained RL algorithm, you can now evaluate it from checkpoint by copying the command the previous example training run printed.
Running this command will print evaluation results, namely, the rewards achieved by your trained RL algorithm on the CartPole-v1
environment.
! rllib evaluate /root/ray_results/cartpole-ppo/PPO_CartPole-v1_3bb16_00000_0_2023-03-18_08-31-09/checkpoint_000007 --algo PPO
2023-03-18 08:40:10,744 INFO algorithm.py:1005 -- Ran round 1 of parallel evaluation (1/1 episodes done) Episode #23: reward: 500.0
Ray RLlib is dedicated to reinforcement learning, but what do you do if you need to train models for other types of machine learning, like supervised learning? You can use another Ray library for distributed training in this case: Ray Train.
from ray import tune
import math
import time
#Simulate an expensive training function that depends on two hyperparameters, x and y, read from a config.
def training_function(config):
x, y = config["x"], config["y"]
time.sleep(10)
score = objective(x, y)
#After sleeping for 10 seconds to simulate training and computing the objective, the
#score is reported to tune.
tune.report(score=score)
#The objective computes the mean of the squares of x and y and returns the square root
#of this term. This type of objective is fairly common in ML.
def objective(x, y):
return math.sqrt((x**2 + y**2)/2)
#Use tune.run to initialize hyperparameter optimization on our training_function.
result = tune.run(
training_function,
config={
#A key part is to provide a parameter space for x and y for tune to search over.
"x": tune.grid_search([-1, -.5, 0, .5, 1]),
"y": tune.grid_search([-1, -.5, 0, .5, 1])
})
print(result.get_best_config(metric="score", mode="min"))
Current time: | 2023-03-18 08:52:36 |
Running for: | 00:02:15.24 |
Memory: | 1.5/12.7 GiB |
Trial name | status | loc | x | y | iter | total time (s) | score |
---|---|---|---|---|---|---|---|
training_function_e994b_00000 | TERMINATED | 172.28.0.12:13138 | -1 | -1 | 1 | 10.193 | 1 |
training_function_e994b_00001 | TERMINATED | 172.28.0.12:13186 | -0.5 | -1 | 1 | 10.05 | 0.790569 |
training_function_e994b_00002 | TERMINATED | 172.28.0.12:13138 | 0 | -1 | 1 | 10.0499 | 0.707107 |
training_function_e994b_00003 | TERMINATED | 172.28.0.12:13186 | 0.5 | -1 | 1 | 10.0483 | 0.790569 |
training_function_e994b_00004 | TERMINATED | 172.28.0.12:13138 | 1 | -1 | 1 | 10.0472 | 1 |
training_function_e994b_00005 | TERMINATED | 172.28.0.12:13186 | -1 | -0.5 | 1 | 10.0501 | 0.790569 |
training_function_e994b_00006 | TERMINATED | 172.28.0.12:13138 | -0.5 | -0.5 | 1 | 10.0503 | 0.5 |
training_function_e994b_00007 | TERMINATED | 172.28.0.12:13186 | 0 | -0.5 | 1 | 10.0493 | 0.353553 |
training_function_e994b_00008 | TERMINATED | 172.28.0.12:13138 | 0.5 | -0.5 | 1 | 10.0502 | 0.5 |
training_function_e994b_00009 | TERMINATED | 172.28.0.12:13186 | 1 | -0.5 | 1 | 10.0474 | 0.790569 |
training_function_e994b_00010 | TERMINATED | 172.28.0.12:13138 | -1 | 0 | 1 | 10.0501 | 0.707107 |
training_function_e994b_00011 | TERMINATED | 172.28.0.12:13186 | -0.5 | 0 | 1 | 10.0506 | 0.353553 |
training_function_e994b_00012 | TERMINATED | 172.28.0.12:13138 | 0 | 0 | 1 | 10.0502 | 0 |
training_function_e994b_00013 | TERMINATED | 172.28.0.12:13186 | 0.5 | 0 | 1 | 10.0485 | 0.353553 |
training_function_e994b_00014 | TERMINATED | 172.28.0.12:13138 | 1 | 0 | 1 | 10.0495 | 0.707107 |
training_function_e994b_00015 | TERMINATED | 172.28.0.12:13186 | -1 | 0.5 | 1 | 10.0494 | 0.790569 |
training_function_e994b_00016 | TERMINATED | 172.28.0.12:13138 | -0.5 | 0.5 | 1 | 10.0458 | 0.5 |
training_function_e994b_00017 | TERMINATED | 172.28.0.12:13186 | 0 | 0.5 | 1 | 10.0489 | 0.353553 |
training_function_e994b_00018 | TERMINATED | 172.28.0.12:13138 | 0.5 | 0.5 | 1 | 10.0503 | 0.5 |
training_function_e994b_00019 | TERMINATED | 172.28.0.12:13186 | 1 | 0.5 | 1 | 10.0503 | 0.790569 |
training_function_e994b_00020 | TERMINATED | 172.28.0.12:13138 | -1 | 1 | 1 | 10.0499 | 1 |
training_function_e994b_00021 | TERMINATED | 172.28.0.12:13186 | -0.5 | 1 | 1 | 10.0504 | 0.790569 |
training_function_e994b_00022 | TERMINATED | 172.28.0.12:13138 | 0 | 1 | 1 | 10.0494 | 0.707107 |
training_function_e994b_00023 | TERMINATED | 172.28.0.12:13186 | 0.5 | 1 | 1 | 10.0468 | 0.790569 |
training_function_e994b_00024 | TERMINATED | 172.28.0.12:13138 | 1 | 1 | 1 | 10.05 | 1 |
Trial name | date | done | episodes_total | experiment_id | experiment_tag | hostname | iterations_since_restore | node_ip | pid | score | time_since_restore | time_this_iter_s | time_total_s | timestamp | timesteps_since_restore | timesteps_total | training_iteration | trial_id | warmup_time |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
training_function_e994b_00000 | 2023-03-18_08-50-35 | True | a2865e213e9242f5a4c2741709618e0a | 0_x=-1,y=-1 | 0738217da70e | 1 | 172.28.0.12 | 13138 | 1 | 10.193 | 10.193 | 10.193 | 1679129435 | 0 | 1 | e994b_00000 | 0.0200734 | ||
training_function_e994b_00001 | 2023-03-18_08-50-38 | True | 96496a0a28f24fc7904fb5b942aa64f1 | 1_x=-0.5000,y=-1 | 0738217da70e | 1 | 172.28.0.12 | 13186 | 0.790569 | 10.05 | 10.05 | 10.05 | 1679129438 | 0 | 1 | e994b_00001 | 0.00650549 | ||
training_function_e994b_00002 | 2023-03-18_08-50-45 | True | a2865e213e9242f5a4c2741709618e0a | 2_x=0,y=-1 | 0738217da70e | 1 | 172.28.0.12 | 13138 | 0.707107 | 10.0499 | 10.0499 | 10.0499 | 1679129445 | 0 | 1 | e994b_00002 | 0.0200734 | ||
training_function_e994b_00003 | 2023-03-18_08-50-48 | True | 96496a0a28f24fc7904fb5b942aa64f1 | 3_x=0.5000,y=-1 | 0738217da70e | 1 | 172.28.0.12 | 13186 | 0.790569 | 10.0483 | 10.0483 | 10.0483 | 1679129448 | 0 | 1 | e994b_00003 | 0.00650549 | ||
training_function_e994b_00004 | 2023-03-18_08-50-55 | True | a2865e213e9242f5a4c2741709618e0a | 4_x=1,y=-1 | 0738217da70e | 1 | 172.28.0.12 | 13138 | 1 | 10.0472 | 10.0472 | 10.0472 | 1679129455 | 0 | 1 | e994b_00004 | 0.0200734 | ||
training_function_e994b_00005 | 2023-03-18_08-50-59 | True | 96496a0a28f24fc7904fb5b942aa64f1 | 5_x=-1,y=-0.5000 | 0738217da70e | 1 | 172.28.0.12 | 13186 | 0.790569 | 10.0501 | 10.0501 | 10.0501 | 1679129459 | 0 | 1 | e994b_00005 | 0.00650549 | ||
training_function_e994b_00006 | 2023-03-18_08-51-05 | True | a2865e213e9242f5a4c2741709618e0a | 6_x=-0.5000,y=-0.5000 | 0738217da70e | 1 | 172.28.0.12 | 13138 | 0.5 | 10.0503 | 10.0503 | 10.0503 | 1679129465 | 0 | 1 | e994b_00006 | 0.0200734 | ||
training_function_e994b_00007 | 2023-03-18_08-51-09 | True | 96496a0a28f24fc7904fb5b942aa64f1 | 7_x=0,y=-0.5000 | 0738217da70e | 1 | 172.28.0.12 | 13186 | 0.353553 | 10.0493 | 10.0493 | 10.0493 | 1679129469 | 0 | 1 | e994b_00007 | 0.00650549 | ||
training_function_e994b_00008 | 2023-03-18_08-51-15 | True | a2865e213e9242f5a4c2741709618e0a | 8_x=0.5000,y=-0.5000 | 0738217da70e | 1 | 172.28.0.12 | 13138 | 0.5 | 10.0502 | 10.0502 | 10.0502 | 1679129475 | 0 | 1 | e994b_00008 | 0.0200734 | ||
training_function_e994b_00009 | 2023-03-18_08-51-19 | True | 96496a0a28f24fc7904fb5b942aa64f1 | 9_x=1,y=-0.5000 | 0738217da70e | 1 | 172.28.0.12 | 13186 | 0.790569 | 10.0474 | 10.0474 | 10.0474 | 1679129479 | 0 | 1 | e994b_00009 | 0.00650549 | ||
training_function_e994b_00010 | 2023-03-18_08-51-25 | True | a2865e213e9242f5a4c2741709618e0a | 10_x=-1,y=0 | 0738217da70e | 1 | 172.28.0.12 | 13138 | 0.707107 | 10.0501 | 10.0501 | 10.0501 | 1679129485 | 0 | 1 | e994b_00010 | 0.0200734 | ||
training_function_e994b_00011 | 2023-03-18_08-51-29 | True | 96496a0a28f24fc7904fb5b942aa64f1 | 11_x=-0.5000,y=0 | 0738217da70e | 1 | 172.28.0.12 | 13186 | 0.353553 | 10.0506 | 10.0506 | 10.0506 | 1679129489 | 0 | 1 | e994b_00011 | 0.00650549 | ||
training_function_e994b_00012 | 2023-03-18_08-51-35 | True | a2865e213e9242f5a4c2741709618e0a | 12_x=0,y=0 | 0738217da70e | 1 | 172.28.0.12 | 13138 | 0 | 10.0502 | 10.0502 | 10.0502 | 1679129495 | 0 | 1 | e994b_00012 | 0.0200734 | ||
training_function_e994b_00013 | 2023-03-18_08-51-39 | True | 96496a0a28f24fc7904fb5b942aa64f1 | 13_x=0.5000,y=0 | 0738217da70e | 1 | 172.28.0.12 | 13186 | 0.353553 | 10.0485 | 10.0485 | 10.0485 | 1679129499 | 0 | 1 | e994b_00013 | 0.00650549 | ||
training_function_e994b_00014 | 2023-03-18_08-51-45 | True | a2865e213e9242f5a4c2741709618e0a | 14_x=1,y=0 | 0738217da70e | 1 | 172.28.0.12 | 13138 | 0.707107 | 10.0495 | 10.0495 | 10.0495 | 1679129505 | 0 | 1 | e994b_00014 | 0.0200734 | ||
training_function_e994b_00015 | 2023-03-18_08-51-49 | True | 96496a0a28f24fc7904fb5b942aa64f1 | 15_x=-1,y=0.5000 | 0738217da70e | 1 | 172.28.0.12 | 13186 | 0.790569 | 10.0494 | 10.0494 | 10.0494 | 1679129509 | 0 | 1 | e994b_00015 | 0.00650549 | ||
training_function_e994b_00016 | 2023-03-18_08-51-56 | True | a2865e213e9242f5a4c2741709618e0a | 16_x=-0.5000,y=0.5000 | 0738217da70e | 1 | 172.28.0.12 | 13138 | 0.5 | 10.0458 | 10.0458 | 10.0458 | 1679129516 | 0 | 1 | e994b_00016 | 0.0200734 | ||
training_function_e994b_00017 | 2023-03-18_08-51-59 | True | 96496a0a28f24fc7904fb5b942aa64f1 | 17_x=0,y=0.5000 | 0738217da70e | 1 | 172.28.0.12 | 13186 | 0.353553 | 10.0489 | 10.0489 | 10.0489 | 1679129519 | 0 | 1 | e994b_00017 | 0.00650549 | ||
training_function_e994b_00018 | 2023-03-18_08-52-06 | True | a2865e213e9242f5a4c2741709618e0a | 18_x=0.5000,y=0.5000 | 0738217da70e | 1 | 172.28.0.12 | 13138 | 0.5 | 10.0503 | 10.0503 | 10.0503 | 1679129526 | 0 | 1 | e994b_00018 | 0.0200734 | ||
training_function_e994b_00019 | 2023-03-18_08-52-09 | True | 96496a0a28f24fc7904fb5b942aa64f1 | 19_x=1,y=0.5000 | 0738217da70e | 1 | 172.28.0.12 | 13186 | 0.790569 | 10.0503 | 10.0503 | 10.0503 | 1679129529 | 0 | 1 | e994b_00019 | 0.00650549 | ||
training_function_e994b_00020 | 2023-03-18_08-52-16 | True | a2865e213e9242f5a4c2741709618e0a | 20_x=-1,y=1 | 0738217da70e | 1 | 172.28.0.12 | 13138 | 1 | 10.0499 | 10.0499 | 10.0499 | 1679129536 | 0 | 1 | e994b_00020 | 0.0200734 | ||
training_function_e994b_00021 | 2023-03-18_08-52-19 | True | 96496a0a28f24fc7904fb5b942aa64f1 | 21_x=-0.5000,y=1 | 0738217da70e | 1 | 172.28.0.12 | 13186 | 0.790569 | 10.0504 | 10.0504 | 10.0504 | 1679129539 | 0 | 1 | e994b_00021 | 0.00650549 | ||
training_function_e994b_00022 | 2023-03-18_08-52-26 | True | a2865e213e9242f5a4c2741709618e0a | 22_x=0,y=1 | 0738217da70e | 1 | 172.28.0.12 | 13138 | 0.707107 | 10.0494 | 10.0494 | 10.0494 | 1679129546 | 0 | 1 | e994b_00022 | 0.0200734 | ||
training_function_e994b_00023 | 2023-03-18_08-52-30 | True | 96496a0a28f24fc7904fb5b942aa64f1 | 23_x=0.5000,y=1 | 0738217da70e | 1 | 172.28.0.12 | 13186 | 0.790569 | 10.0468 | 10.0468 | 10.0468 | 1679129550 | 0 | 1 | e994b_00023 | 0.00650549 | ||
training_function_e994b_00024 | 2023-03-18_08-52-36 | True | a2865e213e9242f5a4c2741709618e0a | 24_x=1,y=1 | 0738217da70e | 1 | 172.28.0.12 | 13138 | 1 | 10.05 | 10.05 | 10.05 | 1679129556 | 0 | 1 | e994b_00024 | 0.0200734 |
2023-03-18 08:52:36,761 INFO tune.py:762 -- Total run time: 136.82 seconds (135.23 seconds for the tuning loop).
{'x': 0, 'y': 0}
Notice how the output of this run is structurally similar to what you saw in the RLlib example. That’s no coincidence, as RLlib (like many other Ray libraries) uses Ray Tune under the hood. If you look closely, you will see PENDING
runs that wait for execution, as well as RUNNING
and TERMINATED
runs. Tune takes care of selecting, scheduling, and executing your training runs automatically.
Specifically, this Tune example finds the best possible choices of parameters x
and y
for a training_function
with a given objective
we want to minimize. Even though the objective function might look a little intimidating at first, since we compute the sum of squares of x
and y
, all values will be non-negative. That means the smallest value is obtained at x=0
and y=0
, which evaluates the objective function to 0
.
We do a so-called grid search over all possible parameter combinations. As we explicitly pass in 5 possible values for both x
and y
, that’s a total of 25 combinations that get fed into the training function. Since we instruct training_function
to sleep for 10 seconds, testing all combinations of hyperparameters sequentially would take more than 4 minutes total. Since Ray is smart about parallelizing this workload, this whole experiment took only about 35 seconds for us, but it might take much longer, depending on where you run it.
The last of Ray’s high-level libraries we’ll discuss specializes in model serving and is simply called Ray Serve. To see an example of it in action, you need a trained ML model to serve. Luckily, nowadays, you can find many interesting models on the internet that have already been trained for you. For instance, Hugging Face has a variety of models available for you to download directly in Python. The model we’ll use is a language model called GPT-2 that takes text as input and produces text to continue or complete the input. For example, you can prompt a question and GPT-2 will try to complete it.
Serving such a model is a good way to make it accessible. You may not know how to load and run a TensorFlow model on your computer, but you do know how to ask a question in plain English. Model serving hides the implementation details of a solution and lets users focus on providing inputs and understanding outputs of a model.
To proceed, make sure to run pip install transformers
to install the Hugging Face library that has the model we want to use. With that we can now import and start an instance of Ray’s serve library, load and deploy a GPT-2 model, and ask it for the meaning of life, like so:
from ray import serve
from transformers import pipeline
import requests
#Start serve locally.
serve.start()
#The @serve.deployment decorator turns a function with a request parameter into a serve deployment.
@serve.deployment
def model(request):
#Loading language_model inside the model function for every request is inefficient,
#but it’s the quickest way to show you a deployment.
language_model = pipeline("text-generation", model="gpt2")
query = request.query_params["query"]
#Ask the model to give us at most 100 characters to continue our query.
return language_model(query, max_length=100)
#Formally deploy the model so that it can start receiving requests over HTTP.
model.deploy()
query = "What's the meaning of life?"
#Use the indispensable requests library to get a response for any question you might have.
response = requests.get(f"http://localhost:8000/model?query={query}")
print(response.text)
[{"generated_text": "What's the meaning of life?\n\nThe meaning of life is the idea that \"being alive\" isn't just a \"real life experience\". There's a lot of life around you, to be human. Life can seem strange at first and confusing at first, but it's the same when you know it is happening. When you have your life, you can be alive. And you can stay in it.\n\nHow are you feeling now?\n\nIt feels like I'm at"}]