(tune-aim-ref)=
Aim is an easy-to-use and supercharged open-source experiment tracker. Aim logs your training runs, enables a well-designed UI to compare them, and provides an API to query them programmatically.
{image}
:align: center
:alt: Aim
:width: 100%
:target: https://aimstack.io
Ray Tune currently offers built-in integration with Aim.
The {ref}AimLoggerCallback <tune-aim-logger>
automatically logs metrics that are reported to Tune by using the Aim API.
{contents}
:backlinks: none
:local: true
The following example demonstrates how the AimLoggerCallback
can be used in a Tune experiment.
Begin by installing and importing the necessary modules:
%pip install aim
%pip install ray[tune]
import numpy as np
import ray
from ray import train, tune
from ray.tune.logger.aim import AimLoggerCallback
Next, define a simple train_function
, which is a Trainable
that reports a loss to Tune.
The objective function itself is not important for this example, as our main focus is on the integration with Aim.
def train_function(config):
for _ in range(50):
loss = config["mean"] + config["sd"] * np.random.randn()
train.report({"loss": loss})
Here is an example of how you can use the AimLoggerCallback
with simple grid-search Tune experiment.
The logger will log each of the 9 grid-search trials as separate Aim runs.
tuner = tune.Tuner(
train_function,
run_config=train.RunConfig(
callbacks=[AimLoggerCallback()],
storage_path="/tmp/ray_results",
name="aim_example",
),
param_space={
"mean": tune.grid_search([1, 2, 3, 4, 5, 6, 7, 8, 9]),
"sd": tune.uniform(0.1, 0.9),
},
tune_config=tune.TuneConfig(
metric="loss",
mode="min",
),
)
tuner.fit()
2023-02-07 00:04:11,228 INFO worker.py:1544 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
Current time: | 2023-02-07 00:04:19 |
Running for: | 00:00:06.86 |
Memory: | 32.8/64.0 GiB |
Trial name | status | loc | mean | sd | iter | total time (s) | loss |
---|---|---|---|---|---|---|---|
train_function_01a3b_00000 | TERMINATED | 127.0.0.1:10277 | 1 | 0.385428 | 50 | 4.48031 | 1.01928 |
train_function_01a3b_00001 | TERMINATED | 127.0.0.1:10296 | 2 | 0.819716 | 50 | 2.97272 | 3.01491 |
train_function_01a3b_00002 | TERMINATED | 127.0.0.1:10301 | 3 | 0.769197 | 50 | 2.39572 | 3.87155 |
train_function_01a3b_00003 | TERMINATED | 127.0.0.1:10307 | 4 | 0.29466 | 50 | 2.41568 | 4.1507 |
train_function_01a3b_00004 | TERMINATED | 127.0.0.1:10313 | 5 | 0.152208 | 50 | 1.68383 | 5.10225 |
train_function_01a3b_00005 | TERMINATED | 127.0.0.1:10321 | 6 | 0.879814 | 50 | 1.54015 | 6.20238 |
train_function_01a3b_00006 | TERMINATED | 127.0.0.1:10329 | 7 | 0.487499 | 50 | 1.44706 | 7.79551 |
train_function_01a3b_00007 | TERMINATED | 127.0.0.1:10333 | 8 | 0.639783 | 50 | 1.4261 | 7.94189 |
train_function_01a3b_00008 | TERMINATED | 127.0.0.1:10341 | 9 | 0.12285 | 50 | 1.07701 | 8.82304 |
Trial name | date | done | episodes_total | experiment_id | experiment_tag | hostname | iterations_since_restore | loss | node_ip | pid | time_since_restore | time_this_iter_s | time_total_s | timestamp | timesteps_since_restore | timesteps_total | training_iteration | trial_id | warmup_time |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
train_function_01a3b_00000 | 2023-02-07_00-04-18 | True | c8447fdceea6436c9edd6f030a5b1d82 | 0_mean=1,sd=0.3854 | Justins-MacBook-Pro-16 | 50 | 1.01928 | 127.0.0.1 | 10277 | 4.48031 | 0.013865 | 4.48031 | 1675757058 | 0 | 50 | 01a3b_00000 | 0.00264072 | ||
train_function_01a3b_00001 | 2023-02-07_00-04-18 | True | 7dd6d3ee24244a0885b354c285064728 | 1_mean=2,sd=0.8197 | Justins-MacBook-Pro-16 | 50 | 3.01491 | 127.0.0.1 | 10296 | 2.97272 | 0.0584073 | 2.97272 | 1675757058 | 0 | 50 | 01a3b_00001 | 0.0316792 | ||
train_function_01a3b_00002 | 2023-02-07_00-04-18 | True | e3da49ebad034c4b8fdaf0aa87927b1a | 2_mean=3,sd=0.7692 | Justins-MacBook-Pro-16 | 50 | 3.87155 | 127.0.0.1 | 10301 | 2.39572 | 0.0695491 | 2.39572 | 1675757058 | 0 | 50 | 01a3b_00002 | 0.0315411 | ||
train_function_01a3b_00003 | 2023-02-07_00-04-18 | True | 95c60c4f67c4481ebccff25b0a49e75d | 3_mean=4,sd=0.2947 | Justins-MacBook-Pro-16 | 50 | 4.1507 | 127.0.0.1 | 10307 | 2.41568 | 0.0175381 | 2.41568 | 1675757058 | 0 | 50 | 01a3b_00003 | 0.0310779 | ||
train_function_01a3b_00004 | 2023-02-07_00-04-18 | True | a216253cb41e47caa229e65488deb019 | 4_mean=5,sd=0.1522 | Justins-MacBook-Pro-16 | 50 | 5.10225 | 127.0.0.1 | 10313 | 1.68383 | 0.064441 | 1.68383 | 1675757058 | 0 | 50 | 01a3b_00004 | 0.00450182 | ||
train_function_01a3b_00005 | 2023-02-07_00-04-18 | True | 23834104277f476cb99d9c696281fceb | 5_mean=6,sd=0.8798 | Justins-MacBook-Pro-16 | 50 | 6.20238 | 127.0.0.1 | 10321 | 1.54015 | 0.00910306 | 1.54015 | 1675757058 | 0 | 50 | 01a3b_00005 | 0.0480251 | ||
train_function_01a3b_00006 | 2023-02-07_00-04-18 | True | 15f650121df747c3bd2720481d47b265 | 6_mean=7,sd=0.4875 | Justins-MacBook-Pro-16 | 50 | 7.79551 | 127.0.0.1 | 10329 | 1.44706 | 0.00600386 | 1.44706 | 1675757058 | 0 | 50 | 01a3b_00006 | 0.00202489 | ||
train_function_01a3b_00007 | 2023-02-07_00-04-19 | True | 78b1673cf2034ed99135b80a0cb31e0e | 7_mean=8,sd=0.6398 | Justins-MacBook-Pro-16 | 50 | 7.94189 | 127.0.0.1 | 10333 | 1.4261 | 0.00225306 | 1.4261 | 1675757059 | 0 | 50 | 01a3b_00007 | 0.00209713 | ||
train_function_01a3b_00008 | 2023-02-07_00-04-19 | True | c7f5d86154cb46b6aa27bef523edcd6f | 8_mean=9,sd=0.1228 | Justins-MacBook-Pro-16 | 50 | 8.82304 | 127.0.0.1 | 10341 | 1.07701 | 0.00291467 | 1.07701 | 1675757059 | 0 | 50 | 01a3b_00008 | 0.00240111 |
2023-02-07 00:04:19,366 INFO tune.py:798 -- Total run time: 7.38 seconds (6.85 seconds for the tuning loop).
<ray.tune.result_grid.ResultGrid at 0x137de07c0>
When the script executes, a grid-search is carried out and the results are saved to the Aim repo,
stored at the default location -- the experiment log directory (in this case, it's at /tmp/ray_results/aim_example
).
In the example above, we used the default configuration for the AimLoggerCallback
.
There are a few options that can be configured as arguments to the callback. For example,
setting AimLoggerCallback(repo="/path/to/repo")
will log results to the Aim repo at that
filepath, which could be useful if you have a central location where the results of multiple
Tune experiments are stored. Relative paths to the working directory where Tune script is
launched can be used as well. By default, the repo will be set to the experiment log
directory. See the API reference for more configurations.
Now that we have logged our results to the Aim repository, we can view it in Aim's web UI. To do this, we first find the directory where the Aim repository lives, then we use the Aim CLI to launch the web interface.
# Uncomment the following line to launch the Aim UI!
#!aim up --repo=/tmp/ray_results/aim_example
--------------------------------------------------------------------------
Aim UI collects anonymous usage analytics.
Read how to opt-out here:
https://aimstack.readthedocs.io/en/latest/community/telemetry.html
--------------------------------------------------------------------------
Running Aim UI on repo `<Repo#-5734997863388805469 path=/tmp/ray_results/aim_example/.aim read_only=None>`
Open http://127.0.0.1:43800
Press Ctrl+C to exit
^C
After launching the Aim UI, we can open the web interface at localhost:43800
.
{image}
:align: center
:alt: Aim Metrics Explorer
:target: https://aimstack.readthedocs.io/en/latest/ui/pages/explorers.html#metrics-explorer
The next sections contain more in-depth information on the API of the Tune-Aim integration.
(tune-aim-logger)=
{eval-rst}
.. autoclass:: ray.tune.logger.aim.AimLoggerCallback
:noindex: