(tune-wandb-ref)=
Weights & Biases (Wandb) is a tool for experiment tracking, model optimizaton, and dataset versioning. It is very popular in the machine learning and data science community for its superb visualization tools.
{image}
:align: center
:alt: Weights & Biases
:height: 80px
:target: https://www.wandb.ai/
Ray Tune currently offers two lightweight integrations for Weights & Biases.
One is the {ref}WandbLoggerCallback <air-wandb-logger>
, which automatically logs
metrics reported to Tune to the Wandb API.
The other one is the {ref}setup_wandb() <air-wandb-setup>
function, which can be
used with the function API. It automatically
initializes the Wandb API with Tune's training information. You can just use the
Wandb API like you would normally do, e.g. using wandb.log()
to log your training
process.
{contents}
:backlinks: none
:local: true
In the following example we're going to use both of the above methods, namely the WandbLoggerCallback
and
the setup_wandb
function to log metrics.
As the very first step, make sure you're logged in into wandb on all machines you're running your training on:
wandb login
We can then start with a few crucial imports:
import numpy as np
import ray
from ray import train, tune
from ray.air.integrations.wandb import WandbLoggerCallback, setup_wandb
Next, let's define an easy train_function
function (a Tune Trainable
) that reports a random loss to Tune.
The objective function itself is not important for this example, since we want to focus on the Weights & Biases
integration primarily.
def train_function(config):
for i in range(30):
loss = config["mean"] + config["sd"] * np.random.randn()
train.report({"loss": loss})
You can define a
simple grid-search Tune run using the WandbLoggerCallback
as follows:
def tune_with_callback():
"""Example for using a WandbLoggerCallback with the function API"""
tuner = tune.Tuner(
train_function,
tune_config=tune.TuneConfig(
metric="loss",
mode="min",
),
run_config=train.RunConfig(
callbacks=[WandbLoggerCallback(project="Wandb_example")]
),
param_space={
"mean": tune.grid_search([1, 2, 3, 4, 5]),
"sd": tune.uniform(0.2, 0.8),
},
)
tuner.fit()
To use the setup_wandb
utility, you simply call this function in your objective.
Note that we also use wandb.log(...)
to log the loss
to Weights & Biases as a dictionary.
Otherwise, this version of our objective is identical to its original.
def train_function_wandb(config):
wandb = setup_wandb(config, project="Wandb_example")
for i in range(30):
loss = config["mean"] + config["sd"] * np.random.randn()
train.report({"loss": loss})
wandb.log(dict(loss=loss))
With the train_function_wandb
defined, your Tune experiment will set up wandb
in each trial once it starts!
def tune_with_setup():
"""Example for using the setup_wandb utility with the function API"""
tuner = tune.Tuner(
train_function_wandb,
tune_config=tune.TuneConfig(
metric="loss",
mode="min",
),
param_space={
"mean": tune.grid_search([1, 2, 3, 4, 5]),
"sd": tune.uniform(0.2, 0.8),
},
)
tuner.fit()
Finally, you can also define a class-based Tune Trainable
by using the setup_wandb
in the setup()
method and storing the run object as an attribute. Please note that with the class trainable, you have to pass the trial id, name, and group separately:
class WandbTrainable(tune.Trainable):
def setup(self, config):
self.wandb = setup_wandb(
config,
trial_id=self.trial_id,
trial_name=self.trial_name,
group="Example",
project="Wandb_example",
)
def step(self):
for i in range(30):
loss = self.config["mean"] + self.config["sd"] * np.random.randn()
self.wandb.log({"loss": loss})
return {"loss": loss, "done": True}
def save_checkpoint(self, checkpoint_dir: str):
pass
def load_checkpoint(self, checkpoint_dir: str):
pass
Running Tune with this WandbTrainable
works exactly the same as with the function API.
The below tune_trainable
function differs from tune_decorated
above only in the first argument we pass to
Tuner()
:
def tune_trainable():
"""Example for using a WandTrainableMixin with the class API"""
tuner = tune.Tuner(
WandbTrainable,
tune_config=tune.TuneConfig(
metric="loss",
mode="min",
),
param_space={
"mean": tune.grid_search([1, 2, 3, 4, 5]),
"sd": tune.uniform(0.2, 0.8),
},
)
results = tuner.fit()
return results.get_best_result().config
Since you may not have an API key for Wandb, we can mock the Wandb logger and test all three of our training
functions as follows.
If you are logged in into wandb, you can set mock_api = False
to actually upload your results to Weights & Biases.
import os
mock_api = True
if mock_api:
os.environ.setdefault("WANDB_MODE", "disabled")
os.environ.setdefault("WANDB_API_KEY", "abcd")
ray.init(
runtime_env={"env_vars": {"WANDB_MODE": "disabled", "WANDB_API_KEY": "abcd"}}
)
tune_with_callback()
tune_with_setup()
tune_trainable()
2022-11-02 16:02:45,355 INFO worker.py:1534 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8266
2022-11-02 16:02:46,513 INFO wandb.py:282 -- Already logged into W&B.
Current time: | 2022-11-02 16:03:13 |
Running for: | 00:00:27.28 |
Memory: | 10.8/16.0 GiB |
Trial name | status | loc | mean | sd | iter | total time (s) | loss |
---|---|---|---|---|---|---|---|
train_function_7676d_00000 | TERMINATED | 127.0.0.1:14578 | 1 | 0.411212 | 30 | 0.236137 | 0.828527 |
train_function_7676d_00001 | TERMINATED | 127.0.0.1:14591 | 2 | 0.756339 | 30 | 5.57185 | 3.13156 |
train_function_7676d_00002 | TERMINATED | 127.0.0.1:14593 | 3 | 0.436643 | 30 | 5.50237 | 3.26679 |
train_function_7676d_00003 | TERMINATED | 127.0.0.1:14595 | 4 | 0.295929 | 30 | 5.60986 | 3.70388 |
train_function_7676d_00004 | TERMINATED | 127.0.0.1:14596 | 5 | 0.335292 | 30 | 5.61385 | 4.74294 |
Trial name | date | done | episodes_total | experiment_id | experiment_tag | hostname | iterations_since_restore | loss | node_ip | pid | time_since_restore | time_this_iter_s | time_total_s | timestamp | timesteps_since_restore | timesteps_total | training_iteration | trial_id | warmup_time |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
train_function_7676d_00000 | 2022-11-02_16-02-53 | True | a9f242fa70184d9dadd8952b16fb0ecc | 0_mean=1,sd=0.4112 | Kais-MBP.local.meter | 30 | 0.828527 | 127.0.0.1 | 14578 | 0.236137 | 0.00381589 | 0.236137 | 1667430173 | 0 | 30 | 7676d_00000 | 0.00366998 | ||
train_function_7676d_00001 | 2022-11-02_16-03-03 | True | f57118365bcb4c229fe41c5911f05ad6 | 1_mean=2,sd=0.7563 | Kais-MBP.local.meter | 30 | 3.13156 | 127.0.0.1 | 14591 | 5.57185 | 0.00627518 | 5.57185 | 1667430183 | 0 | 30 | 7676d_00001 | 0.0027349 | ||
train_function_7676d_00002 | 2022-11-02_16-03-03 | True | 394021d4515d4616bae7126668f73b2b | 2_mean=3,sd=0.4366 | Kais-MBP.local.meter | 30 | 3.26679 | 127.0.0.1 | 14593 | 5.50237 | 0.00494576 | 5.50237 | 1667430183 | 0 | 30 | 7676d_00002 | 0.00286222 | ||
train_function_7676d_00003 | 2022-11-02_16-03-03 | True | a575e79c9d95485fa37deaa86267aea4 | 3_mean=4,sd=0.2959 | Kais-MBP.local.meter | 30 | 3.70388 | 127.0.0.1 | 14595 | 5.60986 | 0.00689816 | 5.60986 | 1667430183 | 0 | 30 | 7676d_00003 | 0.00299597 | ||
train_function_7676d_00004 | 2022-11-02_16-03-03 | True | 91ce57dcdbb54536b1874666b711350d | 4_mean=5,sd=0.3353 | Kais-MBP.local.meter | 30 | 4.74294 | 127.0.0.1 | 14596 | 5.61385 | 0.00672579 | 5.61385 | 1667430183 | 0 | 30 | 7676d_00004 | 0.00323987 |
2022-11-02 16:03:13,913 INFO tune.py:788 -- Total run time: 28.53 seconds (27.28 seconds for the tuning loop).
Current time: | 2022-11-02 16:03:22 |
Running for: | 00:00:08.49 |
Memory: | 9.9/16.0 GiB |
Trial name | status | loc | mean | sd | iter | total time (s) | loss |
---|---|---|---|---|---|---|---|
train_function_wandb_877eb_00000 | TERMINATED | 127.0.0.1:14647 | 1 | 0.738281 | 30 | 1.61319 | 0.555153 |
train_function_wandb_877eb_00001 | TERMINATED | 127.0.0.1:14660 | 2 | 0.321178 | 30 | 1.72447 | 2.52109 |
train_function_wandb_877eb_00002 | TERMINATED | 127.0.0.1:14661 | 3 | 0.202487 | 30 | 1.8159 | 2.45412 |
train_function_wandb_877eb_00003 | TERMINATED | 127.0.0.1:14662 | 4 | 0.515434 | 30 | 1.715 | 4.51413 |
train_function_wandb_877eb_00004 | TERMINATED | 127.0.0.1:14663 | 5 | 0.216098 | 30 | 1.72827 | 5.2814 |
(train_function_wandb pid=14647) 2022-11-02 16:03:17,149 INFO wandb.py:282 -- Already logged into W&B.
Trial name | date | done | episodes_total | experiment_id | experiment_tag | hostname | iterations_since_restore | loss | node_ip | pid | time_since_restore | time_this_iter_s | time_total_s | timestamp | timesteps_since_restore | timesteps_total | training_iteration | trial_id | warmup_time |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
train_function_wandb_877eb_00000 | 2022-11-02_16-03-18 | True | 7b250c9f31ab484dad1a1fd29823afdf | 0_mean=1,sd=0.7383 | Kais-MBP.local.meter | 30 | 0.555153 | 127.0.0.1 | 14647 | 1.61319 | 0.00232315 | 1.61319 | 1667430198 | 0 | 30 | 877eb_00000 | 0.00391102 | ||
train_function_wandb_877eb_00001 | 2022-11-02_16-03-22 | True | 5172868368074557a3044ea3a9146673 | 1_mean=2,sd=0.3212 | Kais-MBP.local.meter | 30 | 2.52109 | 127.0.0.1 | 14660 | 1.72447 | 0.0152011 | 1.72447 | 1667430202 | 0 | 30 | 877eb_00001 | 0.00901699 | ||
train_function_wandb_877eb_00002 | 2022-11-02_16-03-22 | True | b13d9bccb1964b4b95e1a858a3ea64c7 | 2_mean=3,sd=0.2025 | Kais-MBP.local.meter | 30 | 2.45412 | 127.0.0.1 | 14661 | 1.8159 | 0.00437403 | 1.8159 | 1667430202 | 0 | 30 | 877eb_00002 | 0.00844812 | ||
train_function_wandb_877eb_00003 | 2022-11-02_16-03-22 | True | 869d7ec7a3544a8387985103e626818f | 3_mean=4,sd=0.5154 | Kais-MBP.local.meter | 30 | 4.51413 | 127.0.0.1 | 14662 | 1.715 | 0.00247812 | 1.715 | 1667430202 | 0 | 30 | 877eb_00003 | 0.00282907 | ||
train_function_wandb_877eb_00004 | 2022-11-02_16-03-22 | True | 84d3112d66f64325bc469e44b8447ef5 | 4_mean=5,sd=0.2161 | Kais-MBP.local.meter | 30 | 5.2814 | 127.0.0.1 | 14663 | 1.72827 | 0.00517201 | 1.72827 | 1667430202 | 0 | 30 | 877eb_00004 | 0.00272107 |
(train_function_wandb pid=14660) 2022-11-02 16:03:20,600 INFO wandb.py:282 -- Already logged into W&B. (train_function_wandb pid=14661) 2022-11-02 16:03:20,600 INFO wandb.py:282 -- Already logged into W&B. (train_function_wandb pid=14663) 2022-11-02 16:03:20,628 INFO wandb.py:282 -- Already logged into W&B. (train_function_wandb pid=14662) 2022-11-02 16:03:20,723 INFO wandb.py:282 -- Already logged into W&B. 2022-11-02 16:03:22,565 INFO tune.py:788 -- Total run time: 8.60 seconds (8.48 seconds for the tuning loop).
Current time: | 2022-11-02 16:03:31 |
Running for: | 00:00:09.28 |
Memory: | 9.9/16.0 GiB |
Trial name | status | loc | mean | sd | iter | total time (s) | loss |
---|---|---|---|---|---|---|---|
WandbTrainable_8ca33_00000 | TERMINATED | 127.0.0.1:14718 | 1 | 0.397894 | 1 | 0.000187159 | 0.742345 |
WandbTrainable_8ca33_00001 | TERMINATED | 127.0.0.1:14737 | 2 | 0.386883 | 1 | 0.000151873 | 2.5709 |
WandbTrainable_8ca33_00002 | TERMINATED | 127.0.0.1:14738 | 3 | 0.290693 | 1 | 0.00014019 | 2.99601 |
WandbTrainable_8ca33_00003 | TERMINATED | 127.0.0.1:14739 | 4 | 0.33333 | 1 | 0.00015831 | 3.91276 |
WandbTrainable_8ca33_00004 | TERMINATED | 127.0.0.1:14740 | 5 | 0.645479 | 1 | 0.000150919 | 5.47779 |
(WandbTrainable pid=14718) 2022-11-02 16:03:25,742 INFO wandb.py:282 -- Already logged into W&B.
Trial name | date | done | episodes_total | experiment_id | hostname | iterations_since_restore | loss | node_ip | pid | time_since_restore | time_this_iter_s | time_total_s | timestamp | timesteps_since_restore | timesteps_total | training_iteration | trial_id | warmup_time |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
WandbTrainable_8ca33_00000 | 2022-11-02_16-03-27 | True | 3adb4d0ae0d74d1c9ddd07924b5653b0 | Kais-MBP.local.meter | 1 | 0.742345 | 127.0.0.1 | 14718 | 0.000187159 | 0.000187159 | 0.000187159 | 1667430207 | 0 | 1 | 8ca33_00000 | 1.31382 | ||
WandbTrainable_8ca33_00001 | 2022-11-02_16-03-31 | True | f1511cfd51f94b3d9cf192181ccc08a9 | Kais-MBP.local.meter | 1 | 2.5709 | 127.0.0.1 | 14737 | 0.000151873 | 0.000151873 | 0.000151873 | 1667430211 | 0 | 1 | 8ca33_00001 | 1.31668 | ||
WandbTrainable_8ca33_00002 | 2022-11-02_16-03-31 | True | a7528ec6adf74de0b73aa98ebedab66d | Kais-MBP.local.meter | 1 | 2.99601 | 127.0.0.1 | 14738 | 0.00014019 | 0.00014019 | 0.00014019 | 1667430211 | 0 | 1 | 8ca33_00002 | 1.32008 | ||
WandbTrainable_8ca33_00003 | 2022-11-02_16-03-31 | True | b7af756ca586449ba2d4c44141b53b06 | Kais-MBP.local.meter | 1 | 3.91276 | 127.0.0.1 | 14739 | 0.00015831 | 0.00015831 | 0.00015831 | 1667430211 | 0 | 1 | 8ca33_00003 | 1.31879 | ||
WandbTrainable_8ca33_00004 | 2022-11-02_16-03-31 | True | 196624f42bcc45c18a26778573a43a2c | Kais-MBP.local.meter | 1 | 5.47779 | 127.0.0.1 | 14740 | 0.000150919 | 0.000150919 | 0.000150919 | 1667430211 | 0 | 1 | 8ca33_00004 | 1.31945 |
(WandbTrainable pid=14739) 2022-11-02 16:03:30,360 INFO wandb.py:282 -- Already logged into W&B. (WandbTrainable pid=14740) 2022-11-02 16:03:30,393 INFO wandb.py:282 -- Already logged into W&B. (WandbTrainable pid=14737) 2022-11-02 16:03:30,454 INFO wandb.py:282 -- Already logged into W&B. (WandbTrainable pid=14738) 2022-11-02 16:03:30,510 INFO wandb.py:282 -- Already logged into W&B. 2022-11-02 16:03:31,985 INFO tune.py:788 -- Total run time: 9.40 seconds (9.27 seconds for the tuning loop).
{'mean': 1, 'sd': 0.3978937765393781, 'wandb': {'project': 'Wandb_example'}}
This completes our Tune and Wandb walk-through. In the following sections you can find more details on the API of the Tune-Wandb integration.
(air-wandb-logger)=
{eval-rst}
.. autoclass:: ray.air.integrations.wandb.WandbLoggerCallback
:noindex:
(air-wandb-setup)=
{eval-rst}
.. autofunction:: ray.air.integrations.wandb.setup_wandb
:noindex: