Creating an interactive audio visualisation tool from the ground up in Jupyter Notebooks using Python.
Data sitting on a computer somewhere is pretty dull. If you are working with data, it's a good idea to find lots of ways to interact with it. If you work with a type of data that is specific to your field, there'll likely be lots of ways you can think of to interact with it.
For example if it's images, look at them. If you transform your data for any reason, look at them before and after the transformation. It sounds obvious but it can be overlooked by machine learning engineers / data scientists because building tools or bespoke visualisations to interact with data can sometimes feel out of the scope of their responsibilities.
Ok, preaching aside, let's create something that will help people who work with audio within Jupyter notebooks to interact with it. This will allow people working with audio data in Python to listen to their audio alongside any plots they have for the audio e.g. the output of a neural network.
The end goal is to have an interactive audio plot for interacting with audio visualisation plots like this tweet. Credit to this StackOverflow post for sharing a HoloViews audio plot with a playhead.
twitter: https://twitter.com/_ScottCondron/status/1268592561301659648
Here's a version of the final widget that works in a browser. Note: there's a clickable plot if you run it yourself.
First things first, we want to be able to hear the audio. Conveniently, IPython comes with lots of out-of-the-box ways to display data. Here's one for audio:
#collapse-show
from IPython import display
audio_path = "./my_icons/blah.wav"
display.Audio(filename=audio_path)
Although this lets us hear the audio, what if we want to see it? Let's first look at what's inside it:
#collapse-show
from scipy.io import wavfile
sr, wav_data = wavfile.read(audio_path)
print(sr)
print(wav_data.shape)
48000 (775922, 2)
This shows the sample rate is 48000Hz and it has 775922 samples for 2 channels.
wav_data[:,0] # first channel
array([-2, -3, 0, ..., -3, -1, 0], dtype=int16)
Seeing audio in a big numpy array isn't very useful. But what if we plot the values:
#collapse-show
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(wav_data)
plt.show()
The two channels are on top of eachother. We can split them like so:
#collapse-show
fig, axs = plt.subplots(2)
axs[0].plot(wav_data[:,0])
axs[1].plot(wav_data[:,1])
plt.show()
Although this is nice, I'd like to have the x-axis be seconds rather than samples. We can use numpy.linspace
to do this. It just gives use evenly spaced numbers between start
and end
, and we can decide how many numbers.
The duration is just the number of samples divided by the sample rate, and we want the same number of points (to match our y axis).
#collapse-show
import numpy as np
fig, axs = plt.subplots(2)
duration = len(wav_data)/sr
x = np.linspace(0, duration, len(wav_data))
axs[0].plot(x, wav_data[:,0])
axs[1].plot(x, wav_data[:,1]) # audio channel 1
plt.show()
Ok, that's better but is there any better way to view audio than using the amplitude of the waveform??
Smarter people than me came up with viewing audio frequencies rather than amplitudes. 'Spectrograms' of audio are used to display this. They are visualisations of the frequency changes over time. We'll just use one channel from now on for simplicity.
#collapse-show
audio_data = wav_data[:,0] # just use one channel from now on
plt.specgram(audio_data, Fs=sr)
plt.show()
display.Audio(audio_path)
We can do the same thing using scipy to first get the spectogram and then use matplotlib to plot it with a colormesh using the log of the spectrogram.
#collapse-show
from scipy.signal import spectrogram
f, t, sxx = spectrogram(audio_data, sr)
plt.pcolormesh(t, f, np.log10(sxx))
plt.show()
That's getting us close to what we want, but what we really want is to be able to interact with the plot and hear the audio at the point we interact with.
For more interactivity, we're going to reach for a different tool other than matplotlib
and IPython.display
. Holoviews and Panel by the Anaconda team are very nice for custom interactivity. Conveniently for us, Panel's Audio
pane and Holoviews Image
component play nicely together and allow us do more interactive viusalisations.
#hide_output
import holoviews as hv
import panel as pn
hv.extension("bokeh", logo=False)
spec_gram = hv.Image((t, f, np.log10(sxx)), ["Time (s)", "Frequency (hz)"]).opts(width=600)
audio = pn.pane.Audio(audio_data, sample_rate=sr, name='Audio', throttle=500)
pn.Column(spec_gram, audio)
Here we create a Image
the same way we did with matplotlib plt.pcolormesh
and the pn.pane.Audio
using the first channel of the audio_data
we got from scipy.io.wavfile.read(audio_path)
. Finally, we put them together in a pn.Column
so that the spectrogram is displayed above the audio player.
We want the playhead to update when the time changes while you're playing it. To do this, We'll use a Holoviews
DynamicMap. It sounds complicated but put simply, it links a stream with a callback function.
In this case the stream we're using is the Stream
from audio.param.time
and the callback update_playhead
we create that returns a Vline
(the playhead). We use *
operator to overlay the image with the returned Vline
playhead.
#hide_output
def update_playhead(time):
return hv.VLine(time)
dmap_time = hv.DynamicMap(update_playhead, streams=[audio.param.time]).opts(width=600)
pn.Column(audio,
spec_gram * dmap_time)
Note: The slider underneath is because of how I made it work on a static HTML web page. If you run it yourself, there'll be no slider.
That works great, but we also want to be able to click the plot and update the playhead. We do this by merging two streams to trigger one update_playhead
callback within the DynamicMap
. The SingleTap
stream captures when the plot is clicked, and we use Params
to update time
to t
for the merged callback. Within the update_playhead
callback, we just check if x
(the x position of the click) is None
, if it is we use the time.
#collapse-show
def update_playhead(x,y,t):
if x is None:
return hv.VLine(t)
else:
audio.time = x
return hv.VLine(x)
tap_stream = hv.streams.SingleTap(transient=True)
time_play_stream = hv.streams.Params(parameters=[audio.param.time], rename={'time': 't'})
dmap_time = hv.DynamicMap(update_playhead, streams=[time_play_stream, tap_stream])
out = pn.Column(audio,
spec_gram * dmap_time)
#hide
out
Note: This will work when you run the notebook yourself, but the interactivity is lost when hosted on a static HTML web page. You can link it with a Python backend, but that's not happening here because it requires a bit of work that I haven't done.
#collapse_hide
from scipy.signal import spectrogram
import holoviews as hv
import panel as pn
from scipy.io import wavfile
hv.extension("bokeh", logo=False)
sr, wav_data = wavfile.read(audio_path)
audio_data = wav_data[:,0] # first channel
f, t, sxx = spectrogram(audio_data, sr)
spec_gram = hv.Image((t, f, np.log10(sxx)), ["Time (s)", "Frequency (hz)"]).opts(width=600)
audio = pn.pane.Audio(wav_data[:,0], sample_rate=sr, name='Audio', throttle=500)
def update_playhead(x,y,t):
if x is None:
return hv.VLine(t)
else:
audio.time = x
return hv.VLine(x)
tap_stream = hv.streams.SingleTap(transient=True)
time_play_stream = hv.streams.Params(parameters=[audio.param.time], rename={'time': 't'})
dmap_time = hv.DynamicMap(update_playhead, streams=[time_play_stream, tap_stream])
out = pn.Column( audio,
(spec_gram * dmap_time))
#hide
out
I won't really dive into this but you can remove the need for a Python server by using jslink
to rely on your browser's Javascript alone. I'd be interested to hear if there was a nicer way to do this, and how easy it would be to add a click event. That's actually how I made the above plots display in your browser.
#hide_output
from bokeh.resources import INLINE
slider = pn.widgets.FloatSlider(end=duration)
line = hv.VLine(0)
slider.jslink(audio, value='time', bidirectional=True)
slider.jslink(line, value='glyph.location')
pn.Column(spec_gram * line, slider, audio).save('redo', embed=True, resources=INLINE)
You can view and run all the code yourself from here.
I personally love learning about these kind of visualisations and finding ways to creating interactivity. What do you think about these type of widgets for interacting with data? Did you learn a bit about creating interactive visualisations in Python by reading this article? If so, feel free to share it, and you’re also more than welcome to contact me (via Twitter) if you have any questions, comments, or feedback.
Thanks for reading! :rocket:
Follow me on Twitter here for more stuff like this.