For this example, we will use Pytorch and create a interactive Confusion Matrix in Comet ML. You'll need a Comet API key to log the Confusion Matrix, which is free for anyone.
Our goal in this demonstration is to train a Pytorch model to categorize images of digits from the MNIST dataset, being able to see examples of each cell in a confusion matrix, like this:
Comet provides a very easy way to make such confusion matrices. You can do that with a single command:
experiment.log_confusion_matrix(actual, predicted, images=images)
where actual
is the ground truth (given as vectors or labels), predicted
is the ML's prediction (given as vectors or labels), and images
is a list of image data.
Let's explore a complete example from start to finish.
First, we install the needed Python libraries:
%pip install comet_ml>=3.10.0 torch torchvision --quiet
Now we import Comet:
import comet_ml
We can then make sure that our Comet API key is properly configured. The following command will give instructions if not:
comet_ml.init()
COMET INFO: Comet API key is valid
Now, we import the rest of the Python libraries that we will need:
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.utils.data import SubsetRandomSampler
from torch.autograd import Variable
The first time this runs may take a few minutes to download, and then a couple more minutes to process:
train_dataset = dsets.MNIST(
root='./data/',
train=True,
transform=transforms.ToTensor(),
download=True)
test_dataset = dsets.MNIST(
root='./data/',
train=False,
transform=transforms.ToTensor())
We'll now write a function that will create the model.
In this example, we'll take advantage of Comet's Experiment
to get access to the hyperparameters via experiment.get_parameter()
. This will be very handy when we later use Comet's Hyperparameter Optimizer to generate the Experiments.
This function will actually return the three components of the model: the rnn, the criterion, and the optimizer.
def build_model(experiment):
input_size = experiment.get_parameter("input_size")
hidden_size = experiment.get_parameter("hidden_size")
num_layers = experiment.get_parameter("num_layers")
num_classes = experiment.get_parameter("num_classes")
learning_rate = experiment.get_parameter("learning_rate")
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.lstm = nn.LSTM(
input_size,
hidden_size,
num_layers,
batch_first=True)
self.fc = nn.Linear(hidden_size, num_classes)
def forward(self, x):
# Set initial states
h0 = Variable(torch.zeros(self.num_layers, x.size(0),
self.hidden_size))
c0 = Variable(torch.zeros(self.num_layers, x.size(0),
self.hidden_size))
# Forward propagate RNN
self.out, _ = self.lstm(x, (h0, c0))
# Decode hidden state of last time step
out = self.fc(self.out[:, -1, :])
return out
rnn = RNN(
input_size,
hidden_size,
num_layers,
num_classes,
)
# Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
return (rnn, criterion, optimizer)
We'll call this function below, once we create an Experiment
.
Now we are ready to set up a Comet Experiment, and train the model.
First, we can set all of the Hyperparameters of the model:
hyper_params = {
"epochs": 10,
"batch_size": 120,
"first_layer_units": 128,
"sequence_length": 28,
"input_size": 28,
"hidden_size": 128,
"num_layers": 2,
"num_classes": 10,
"learning_rate": 0.01
}
Next we create the experiment, and log the Hyperparameters:
experiment = comet_ml.Experiment(project_name="pytorch-confusion-matrix")
experiment.log_parameters(hyper_params)
COMET WARNING: As you are running in a Jupyter environment, you will need to call `experiment.end()` when finished to ensure all metrics and code are logged before exiting. COMET INFO: Experiment is live on comet.ml https://www.comet.ml/dsblank/pytorch-confusion-matrix/819f19ee68ba4b91bab88421b795451d
We can now construct the model components:
rnn, criterion, optimizer = build_model(experiment)
To make this demonstration go a little faster, we'll just use a sample of the items from the training set:
SAMPLE_SIZE = 1000
Now we can construct the loader:
sampler = SubsetRandomSampler(list(range(SAMPLE_SIZE)))
train_loader = torch.utils.data.DataLoader(
dataset=train_dataset,
batch_size=experiment.get_parameter('batch_size'),
sampler=sampler,
#shuffle=True, # can't use shuffle with sampler
)
Instead, if you would rather train on the entire dataset, you can:
train_loader = torch.utils.data.DataLoader(
dataset=train_dataset,
batch_size=experiment.get_parameter('batch_size'),
shuffle=True,
)
Now we can train the model. Some items to note:
experiment.train()
to provide the context for logged metricswith experiment.train():
step = 0
for epoch in range(experiment.get_parameter('epochs')):
print("\nepoch:", epoch)
correct = 0
total = 0
for batch_step, (images, labels) in enumerate(train_loader):
print(".", end="")
images = Variable(images.view(
-1,
experiment.get_parameter('sequence_length'),
experiment.get_parameter("input_size")))
labels = Variable(labels)
# Forward + Backward + Optimize
optimizer.zero_grad()
outputs = rnn(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# Compute train accuracy
_, predicted = torch.max(outputs.data, 1)
batch_total = labels.size(0)
total += batch_total
batch_correct = (predicted == labels.data).sum()
correct += batch_correct
# Log batch_accuracy to Comet.ml; step is each batch
step += 1
experiment.log_metric("batch_accuracy",
batch_correct / batch_total, step=step)
if (batch_step + 1) % 100 == 0:
print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f' % (
epoch + 1,
experiment.get_parameter('epochs'),
batch_step + 1,
len(train_dataset) // experiment.get_parameter('batch_size'),
loss.item()))
# Log epoch accuracy to Comet.ml; step is each epoch
experiment.log_metric("batch_accuracy", correct / total,
step=epoch, epoch=epoch)
epoch: 0 ......... epoch: 1 ......... epoch: 2 ......... epoch: 3 ......... epoch: 4 ......... epoch: 5 ......... epoch: 6 ......... epoch: 7 ......... epoch: 8 ......... epoch: 9 .........
After the training loop, we can then test the test dataset with:
confusion_matrix = experiment.create_confusion_matrix()
for batch in batches:
...
confusion_matrix.compute_matrix(actual, predicted, images=images)
experiment.log_confusion_matrix(matrix=confusion_matrix)
and that will create a nice Confusion Matrix visualization in Comet with image examples.
Here is the actual code:
test_loader = torch.utils.data.DataLoader(
dataset=test_dataset,
batch_size=32,
shuffle=False,
)
confusion_matrix = experiment.create_confusion_matrix()
for batch_step, (images, labels) in enumerate(test_loader):
print(".", end="")
images = Variable(images.view(
-1,
experiment.get_parameter('sequence_length'),
experiment.get_parameter("input_size")))
labels = Variable(labels)
outputs = rnn(images)
_, predicted = torch.max(outputs.data, 1)
confusion_matrix.compute_matrix(
labels.data,
predicted,
images=images)
experiment.log_confusion_matrix(
matrix=confusion_matrix,
title="MNIST Confusion Matrix, Epoch #%d" % (epoch + 1),
file_name="confusion-matrix-%03d.json" % (epoch + 1),
);
.........................................................................................................................................................................................................................................................................................................................
Now, because we are in a Jupyter Notebook, we signal that the experiment has completed:
experiment.end()
COMET INFO: --------------------------- COMET INFO: Comet.ml Experiment Summary COMET INFO: --------------------------- COMET INFO: Data: COMET INFO: display_summary_level : 1 COMET INFO: url : https://www.comet.ml/dsblank/pytorch-confusion-matrix/819f19ee68ba4b91bab88421b795451d COMET INFO: Metrics [count] (min, max): COMET INFO: train_batch_accuracy [100] : (0.10000000149011612, 0.925000011920929) COMET INFO: train_loss [9] : (0.4030756652355194, 2.309687614440918) COMET INFO: Parameters: COMET INFO: batch_size : 120 COMET INFO: epochs : 10 COMET INFO: first_layer_units : 128 COMET INFO: hidden_size : 128 COMET INFO: input_size : 28 COMET INFO: learning_rate : 0.01 COMET INFO: num_classes : 10 COMET INFO: num_layers : 2 COMET INFO: sequence_length : 28 COMET INFO: Uploads [count]: COMET INFO: confusion-matrix : 1 COMET INFO: environment details : 1 COMET INFO: filename : 1 COMET INFO: images [1258] : 1258 COMET INFO: installed packages : 1 COMET INFO: model graph : 1 COMET INFO: notebook : 1 COMET INFO: os packages : 1 COMET INFO: source_code : 1 COMET INFO: --------------------------- COMET INFO: Uploading metrics, params, and assets to Comet before program termination (may take several seconds) COMET INFO: The Python SDK has 3600 seconds to finish before aborting...
Finally, we can explore the Confusion Matrix in the Comet UI. You can select the epoch by selecting the "Confusion Matrix Name" and click on a cell to see examples of that type.
experiment.display(tab="confusion-matrix")
Clicking on a cell in the matrix should show up to 25 examples of that type of confusion or correct classification.
For more information about Comet ML, please see: