Build your understanding of PyTorch's ConvTranspose1d layer using interactive visualisations
Sometimes we want to see inputs and outputs of PyTorch layers to build an intuition of what they do. If I've read the docs and put a few tensors through the layer while checking the inputs and outputs shapes, generally that's enough.
But sometimes there's weird parameters that I can't get my head around or I just want to see it working, so building interactive widgets helps me grow my understanding.
So in this post I'll show you how I built an interactive widget to explore PyTorch's ConvTranspose1d
, while explaining a bit about the layer itself. We'll use Anacondas's HoloViz tools (Holoviews, Panel and Bokeh) for the plotting and interactivity.
The end goal is to have a interactive plot for interacting with ConvTranspose1d
parameters and seeing the output like this tweet.
twitter: https://twitter.com/_ScottCondron/status/1308096826877325312
Before learning about Transposed Convolutions, you're best learning about Convolutions first. CS231n is a great resource for learning about them.
As you may know, Convolutions are often used to efficiently reduce a dimensions of the input in neural networks. In the case of image classification tasks, they are used to efficiently reduce an input image to a single class score.
Transposed Convolutions are useful when you want to grow your network in a certain dimension. For example, say you have a image segmentation task, in which you want a class prediction per pixel, you can use strided Convolutions to reduce the dimensions and then grow the dimensions back to their original sizel with Transposed Convolutions. This is done in U-net style architectures.
Conveniently, PyTorch
has implemented ConvTranspose1d
such that if it has the same input parameters as Conv1d
and if you pass a tensor through both, the output tensor will be the same shape as the input tensor (provided you set output_padding
correclty).
#collapse-hide
import torch
import torch.nn as nn
from panel.interact import interact
from panel import widgets
import panel as pn
from IPython.display import display
import holoviews as hv
from holoviews import opts
import numpy as np
hv.extension('bokeh', logo=False)
Firstly, to create an image from a 2d numpy array, we'll use the Holoviews library denoted hv
here.
There's a few little bits here to make it nicer but hv.Image(out)
would have worked fine.
You can skip this if you're not interested in the visualisation details.
We set the bounds so we have correct axes. We use *
operator to overlay the image with hv.Labels(image)
so that the values are printed on top of each pixel. We set the vdims
to a fixed range so that colours don't change between updates. We set the width to change depending on the number of pixels so it's easier to watch it grow. L‹astly, we return the image in a HoloViews
pane with linked_axes=False
so that each plot gets it's own axes.
#collapse-show
hv.extension('bokeh', logo=False)
output_dim = 4
def image(out, feature_dim=output_dim, title='', xlabel='Sequence Dimension', ylabel='Feature Dimension'):
output_image = hv.Image(out,
vdims=hv.Dimension('z', range=(0, 100)),
bounds=(0, 0, out.shape[-1], feature_dim))
layout = output_image * hv.Labels(output_image)
layout.opts(
hv.opts.Image(cmap='PiYG',
xlabel=xlabel,
ylabel=ylabel,
title=title,
width=50*out.shape[-1])
)
return pn.pane.HoloViews(layout, linked_axes=False)
Then we create some synthetic input data, I could create random data but instead I create predictable data so it's easier to think about.
#collapse-show
seq_len = 5
input_dim = 6
input_data = torch.tensor([list(range(1, seq_len+1))]*input_dim).double()
image(input_data.detach().numpy(), feature_dim=input_dim)
Another thing I sometimes do is set the weights of the layer itself when doing these visualisations. These would be randomly initialised and then learned by the network in practice.
#collapse-show
kernel_size = 7
weights = torch.tensor([[list(range(i, i+kernel_size)) for i in range(output_dim)]]*input_dim).double()
assert weights.shape == (input_dim, output_dim, kernel_size)
print('Weights Shape [in,out,k]: ', list(weights.shape))
bias = torch.tensor(list(range(output_dim))).double()
print('Bias: ', bias)
Weights Shape [in,out,k]: [6, 4, 7] Bias: tensor([0., 1., 2., 3.], dtype=torch.float64)
For each of the input channels, we have learned filters of shape Kernel Size * Output Channels. Here's one of them:
#collapse-show
image(weights[0].detach().numpy(), xlabel='Kernel Size', ylabel='Output Channels')
#hide_input
hv.extension('matplotlib', logo=False)
dataset3d = hv.Dataset((range(weights.shape[2]), range(weights.shape[1]), range(weights.shape[0]), weights),
['Kernel Size', 'Output Channels', 'Input Channels'], 'Value')
hv.Scatter3D(dataset3d).opts(title='Weights shape')
To have sliders dynamically update the plot, we'll use widgets and interact from panel
.
The @interact
decorator allows you to create widgets and the visualisation that depends on them at the same time. So in this case, we want widgets
to control the different parameters of ConvTranspose1d
. As the widgets change, conv_out
is called and returns the image we defined before.
We use widgets.IntSlider
to explicitly create the widgets for each parameter in the conv_transpose_out
function. One interesting thing to note is that use_bias=True
automatically creates a checkbox for us.
Finally, we return a pn.Column
to compose the input and output images.
#collapse-show
hv.extension('bokeh', logo=False)
input_dim = 6
output_dim = 4
kernel_size = 7
seq_len = 5
stride = 2
dilation = 1
use_bias = True
@interact(padding= widgets.IntSlider(name='Padding', start=0,end=5, step=1, value=1),
output_padding= widgets.IntSlider(name='Output Padding', start=0,end=3, step=1, value=0),
seq_len= widgets.IntSlider(name='Input Sequence Length', start=1,end=10,step=1, value=seq_len),
kernel_size= widgets.IntSlider(name='Kernel Size', start=3,end=5, step=1, value=kernel_size),
stride= widgets.IntSlider(name='Stride', start=1,end=5, step=1, value=stride),
use_bias=True
)
def conv_transpose_out(padding, output_padding, seq_len, kernel_size, stride, use_bias):
if output_padding > stride:
return 'Output Padding needs to be less than or equal to Stride'
conv_t = nn.ConvTranspose1d(input_dim, output_dim,
kernel_size=kernel_size,
stride=stride,
padding=padding,
output_padding=output_padding,
bias=use_bias)
input_data = torch.tensor([list(range(1, seq_len+1))]*input_dim).double()
conv_t.weight.data = weights[:,:,:kernel_size]
if use_bias:
conv_t.bias.data = bias
in_tensor = input_data[None,:,:seq_len]
in_seq_len = in_tensor.shape[-1]
out = conv_t(in_tensor).squeeze(0).detach().numpy()
in_image = image(input_data.detach().numpy(), input_data.shape[0], 'Input')
out_image = image(out, out.shape[0], 'ConvTranspose1d Output')
return pn.Column(out_image)
#hide
conv_transpose_out
If you run this code yourself, you can interact with all parameters together. To be able to show this on a static HTML Github Pages blog, I needed to reduce the space of the sliders so here's how padding and output padding affect the size of the output.
You can see the padding reduces the size of the sequence dimension. As described in the PyTorch documentation:
Note: The padding argument effectively adds "dilation * (kernel_size - 1) - padding" amount of zero padding to both sizes of the input
The output padding arguement adds padding to one side. From the PyTorch documentation:
Note: When stride > 1, Conv1d maps multiple input shapes to the same output shape. output_padding is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side.
#hide
input_dim = 6
output_dim = 4
kernel_size = 3
seq_len = 5
stride = 3
dilation = 1
use_bias = True
@interact(padding= widgets.IntSlider(name='Padding', start=0,end=5, step=1, value=1),
output_padding= widgets.IntSlider(name='Output Padding', start=0,end=3, step=1, value=0),
)
def conv_transpose_out(padding, output_padding):
if output_padding > stride:
return 'Output Padding needs to be less than or equal to Stride'
conv_t = nn.ConvTranspose1d(input_dim, output_dim,
kernel_size=kernel_size,
stride=stride,
padding=padding,
output_padding=output_padding,
bias=use_bias)
input_data = torch.tensor([list(range(1, seq_len+1))]*input_dim).double()
conv_t.weight.data = weights[:,:,:kernel_size]
if use_bias:
conv_t.bias.data = bias
in_tensor = input_data[None,:,:seq_len]
in_seq_len = in_tensor.shape[-1]
out = conv_t(in_tensor).squeeze(0).detach().numpy()
out_image = image(out, out.shape[0], 'ConvTranspose1d Output')
return pn.Column(out_image)
#hide
pn.Row(f'Stride: {stride}, Kernel Size: {kernel_size}', conv_transpose_out)
#hide
from bokeh.resources import INLINE
pn.Row(f'Stride: {stride}, Kernel Size: {kernel_size}', conv_transpose_out).save('conv_t_input_output', embed=True, resources=INLINE, max_states=10000)
0%| | 0/9 [00:00<?, ?it/s]
22%|██▏ | 2/9 [00:00<00:00, 7.91it/s]
56%|█████▌ | 5/9 [00:00<00:00, 8.85it/s]
#hide
padding=1
output_padding=0
@interact(kernel_size= widgets.IntSlider(name='Kernel Size', start=3,end=5, step=1, value=3),
stride= widgets.IntSlider(name='Stride', start=1,end=5, step=1, value=1),
)
def kernel_stride_widget(kernel_size, stride):
conv_t = nn.ConvTranspose1d(input_dim, output_dim,
kernel_size=kernel_size,
stride=stride,
padding=padding,
output_padding=output_padding,
bias=use_bias)
input_data = torch.tensor([list(range(1, seq_len+1))]*input_dim).double()
conv_t.weight.data = weights[:,:,:kernel_size]
if use_bias:
conv_t.bias.data = bias
in_tensor = input_data[None,:,:seq_len]
in_seq_len = in_tensor.shape[-1]
out = conv_t(in_tensor).squeeze(0).detach().numpy()
out_image = image(out, out.shape[0], 'ConvTranspose1d Output')
return pn.Column(out_image)
#hide
kernel_stride_widget
#hide
pn.Row(kernel_stride_widget).save('conv_t_kw_and_stride', embed=True, resources=INLINE)
0%| | 0/9 [00:00<?, ?it/s]
11%|█ | 1/9 [00:00<00:02, 2.74it/s]
22%|██▏ | 2/9 [00:00<00:02, 2.98it/s]
33%|███▎ | 3/9 [00:00<00:01, 3.38it/s]
44%|████▍ | 4/9 [00:01<00:01, 3.09it/s]
56%|█████▌ | 5/9 [00:01<00:01, 3.28it/s]
67%|██████▋ | 6/9 [00:01<00:00, 3.71it/s]
78%|███████▊ | 7/9 [00:02<00:00, 3.38it/s]
89%|████████▉ | 8/9 [00:02<00:00, 3.54it/s]
And here's how the kernel size and stride affect the sequence dimension. You can see that a bigger stride increases the size of the sequence dimension, as opposed to decreasing it like with regular convolutions.
If you'd like to run this yourself or create your own visualisations with different layers, all the code from this post is available on Github.
I personally love learning about new layers in PyTorch and finding ways to interact with them visually. What do you think about these styles of visualisations? Did you learn a bit about Transposed Convolutions or creating interactive visualisations in Python by reading this article? If so, feel free to share it, and you’re also more than welcome to contact me (via Twitter) if you have any questions, comments, or feedback.
Thanks for reading! :rocket:
Follow me on Twitter here for more stuff like this.