This tutorial shows how to use the Melodia Melody Extraction vamp plugin directly in python.
For further details about the Melodia algorithm itself please see:
J. Salamon and E. Gómez, "Melody Extraction from Polyphonic Music Signals using Pitch Contour Characteristics", IEEE Transactions on Audio, Speech and Language Processing, 20(6):1759-1770, Aug. 2012. (copyright notice)
Created by Justin Salamon, January 2016.
Go to the Melodia website, follow the link to the download page, and download the plugin. Then follow the instructions provided in the README file to install the plugin on your system.
Tip: and easy way to check if the plugin is installed correctly is to open Sonic Visualiser and then check if the plugin appears in the menu under Transform > Analysis by Category > Pitch > MELODIA - Melody Extraction...
>pip install vamp
Tip: if you're not already using pip to manage your python packages: https://pip.pypa.io/en/stable/
We need to load the vamp
module of course. Since the module doesn't handle audio loading, we also need something to do that. There are many ways to load audio into python, I normally use either Essentia or Librosa. If you don't have neither installed, Librosa is probably easier to set up (pip install librosa
). Finally, we'll use matplotlib
for visualizing the extracted melody pitch contour.
import vamp
import librosa
import essentia.standard as es
import matplotlib.pyplot as plt
%matplotlib inline
from __future__ import print_function
# This is the audio file we'll be analyzing.
# You can download it here: http://labrosa.ee.columbia.edu/projects/melody/mirex05TrainFiles.zip
audio_file = '/Users/justin/datasets/melody/mirex05/audio/train05.wav'
# This is how we load audio using Essentia
loader = es.MonoLoader(filename=audio_file, downmix = 'mix', sampleRate = 44100)
audio = loader()
# This is how we load audio using Librosa
audio, sr = librosa.load(audio_file, sr=44100, mono=True)
data = vamp.collect(audio, sr, "mtg-melodia:melodia")
# data is a dictionary containing one item called "vector"
data
{'vector': ( 0.002902494, array([-220., -220., -220., ..., -220., -220., -220.], dtype=float32))}
# vector is a tuple of two values: the hop size used for analysis and the array of pitch values
# Note that the hop size is *always* equal to 128/44100.0 = 2.9 ms
hop, melody = data['vector']
print(hop)
print(melody)
0.002902494 [-220. -220. -220. ..., -220. -220. -220.]
first_timestamp = 8 * hop = 8 * 128/44100.0 = 0.023219954648526078
This means that the timestamp of the pitch value at index i (starting with i=0) is given by:
timestamp[i] = 8 * 128/44100.0 + i * (128/44100.0)
So, if you want to generate a timestamp array to match the pitch values, you do it like this:
import numpy as np
timestamps = 8 * 128/44100.0 + np.arange(len(melody)) * (128/44100.0)
As noted above, the hop size used in Melodia is always 2.9 ms, regardless of the sampling rate of the audio being analysed (though a rate of 44100 is recommended for optimal performance). This in turn means the first timestamp is always 23.2 ms, again, regardless of the sampling rate.
# parameter values are specified by providing a dicionary to the optional "parameters" parameter:
params = {"minfqr": 100.0, "maxfqr": 800.0, "voicing": 0.2, "minpeaksalience": 0.0}
data = vamp.collect(audio, sr, "mtg-melodia:melodia", parameters=params)
hop, melody = data['vector']
Melodia has 4 parameters:
# Melodia returns unvoiced (=no melody) sections as negative values. So by default, we get:
plt.figure(figsize=(18,6))
plt.plot(timestamps, melody)
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
plt.show()
# A clearer option is to get rid of the negative values before plotting
melody_pos = melody[:]
melody_pos[melody<=0] = None
plt.figure(figsize=(18,6))
plt.plot(timestamps, melody_pos)
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
plt.show()
# Finally, you might want to plot the pitch sequence in cents rather than in Hz.
# This especially makes sense if you are comparing two or more pitch sequences
# to each other (e.g. comparing an estimate against a reference).
melody_cents = 1200*np.log2(melody/55.0)
melody_cents[melody<=0] = None
plt.figure(figsize=(18,6))
plt.plot(timestamps, melody_cents)
plt.xlabel('Time (s)')
plt.ylabel('Frequency (cents relative to 55 Hz)')
plt.show()
For further information feel free to contact me: justin.salamon@nyu.edu (or justin.salamon@gmail.com)