#!/usr/bin/env python # coding: utf-8 #

Melodia in Python: A Tutorial

# This tutorial shows how to use the [Melodia Melody Extraction vamp plugin](http://mtg.upf.edu/technologies/melodia) directly in python. # # * The tutorial assumes working knowledge of python (and that python is already installed on your system). # * If you already have the [Melodia vamp plugin](http://mtg.upf.edu/technologies/melodia) installed, skip to step 2. # * If you already have the [python vamp module](https://pypi.python.org/pypi/vamp) installed, skip to step 3.

# # For further details about the Melodia algorithm itself please see: # # J. Salamon and E. Gómez, "[Melody Extraction from Polyphonic Music Signals using Pitch Contour Characteristics](http://www.justinsalamon.com/uploads/4/3/9/4/4394963/salamongomezmelodytaslp2012.pdf)", IEEE Transactions on Audio, Speech and Language Processing, 20(6):1759-1770, Aug. 2012. ([copyright notice](http://www.justinsalamon.com/publications.html#IEEE_copyright)) # # Created by [Justin Salamon](http://www.justinsalamon.com), January 2016. #

Step 1: Install Melodia

# Go to the [Melodia website](http://mtg.upf.edu/technologies/melodia), follow the link to the download page, and download the plugin. Then follow the instructions provided in the README file to install the plugin on your system. # # **Tip:** and easy way to check if the plugin is installed correctly is to open [Sonic Visualiser](http://www.sonicvisualiser.org/) and then check if the plugin appears in the menu under Transform > Analysis by Category > Pitch > MELODIA - Melody Extraction... #

Step 2: Install the vamp python module

# ```>pip install vamp``` # **Tip:** if you're not already using pip to manage your python packages: https://pip.pypa.io/en/stable/ #

Step 3: Using Melodia in python

3.1 Necessary modules

# We need to load the ```vamp``` module of course. Since the module doesn't handle audio loading, we also need something to do that. There are many ways to load audio into python, I normally use either [Essentia](http://essentia.upf.edu/) or [Librosa](https://bmcfee.github.io/librosa/). If you don't have neither installed, Librosa is probably easier to set up (```pip install librosa```). Finally, we'll use ```matplotlib``` for visualizing the extracted melody pitch contour. # In[1]: import vamp import librosa import essentia.standard as es import matplotlib.pyplot as plt get_ipython().run_line_magic('matplotlib', 'inline') from __future__ import print_function #

3.2 Loading audio

# In[2]: # This is the audio file we'll be analyzing. # You can download it here: http://labrosa.ee.columbia.edu/projects/melody/mirex05TrainFiles.zip audio_file = '/Users/justin/datasets/melody/mirex05/audio/train05.wav' # In[3]: # This is how we load audio using Essentia loader = es.MonoLoader(filename=audio_file, downmix = 'mix', sampleRate = 44100) audio = loader() # In[4]: # This is how we load audio using Librosa audio, sr = librosa.load(audio_file, sr=44100, mono=True) #

3.3 Exracting the melody using Melodia with default parameter values

# In[5]: data = vamp.collect(audio, sr, "mtg-melodia:melodia") # In[6]: # data is a dictionary containing one item called "vector" data # In[7]: # vector is a tuple of two values: the hop size used for analysis and the array of pitch values # Note that the hop size is *always* equal to 128/44100.0 = 2.9 ms hop, melody = data['vector'] print(hop) print(melody) #

\\\* SUPER IMPORTANT SUPER IMPORTANT \\\*

# For reasons internal to the vamp architecture, THE TIMESTAMP OF THE FIRST VALUE IN THE MELODY ARRAY IS ALWAYS: # # ``` # first_timestamp = 8 * hop = 8 * 128/44100.0 = 0.023219954648526078 # ``` # # This means that the timestamp of the pitch value at index i (starting with i=0) is given by: # # ``` # timestamp[i] = 8 * 128/44100.0 + i * (128/44100.0) # ``` # # So, if you want to generate a timestamp array to match the pitch values, you do it like this: # In[8]: import numpy as np timestamps = 8 * 128/44100.0 + np.arange(len(melody)) * (128/44100.0) # As noted above, the hop size used in Melodia is always 2.9 ms, regardless of the sampling rate of the audio being analysed (though a rate of 44100 is recommended for optimal performance). This in turn means the first timestamp is always 23.2 ms, again, regardless of the sampling rate. #

3.4 Extracting the melody using Melodia with custom parameter values

# In[9]: # parameter values are specified by providing a dicionary to the optional "parameters" parameter: params = {"minfqr": 100.0, "maxfqr": 800.0, "voicing": 0.2, "minpeaksalience": 0.0} data = vamp.collect(audio, sr, "mtg-melodia:melodia", parameters=params) hop, melody = data['vector'] # Melodia has 4 parameters: # * **minfqr**: minimum frequency in Hertz (default 55.0) # * **maxfqr**: maximum frequency in Hertz (default 1760.0) # * **voicing**: voicing tolerance. Greater values will result in more pitch contours included in the final melody. Smaller values will result in less pitch contours included in the final melody (default 0.2). # * **minpeaksalience**: (in Sonic Visualiser "Monophonic Noise Filter") is a hack to avoid silence turning into junk contours when analyzing monophonic recordings (e.g. solo voice with no accompaniment). Generally you want to leave this untouched (default 0.0). #

3.4 Plot the extracted melody

# In[10]: # Melodia returns unvoiced (=no melody) sections as negative values. So by default, we get: plt.figure(figsize=(18,6)) plt.plot(timestamps, melody) plt.xlabel('Time (s)') plt.ylabel('Frequency (Hz)') plt.show() # In[11]: # A clearer option is to get rid of the negative values before plotting melody_pos = melody[:] melody_pos[melody<=0] = None plt.figure(figsize=(18,6)) plt.plot(timestamps, melody_pos) plt.xlabel('Time (s)') plt.ylabel('Frequency (Hz)') plt.show() # In[12]: # Finally, you might want to plot the pitch sequence in cents rather than in Hz. # This especially makes sense if you are comparing two or more pitch sequences # to each other (e.g. comparing an estimate against a reference). melody_cents = 1200*np.log2(melody/55.0) melody_cents[melody<=0] = None plt.figure(figsize=(18,6)) plt.plot(timestamps, melody_cents) plt.xlabel('Time (s)') plt.ylabel('Frequency (cents relative to 55 Hz)') plt.show() #

That's all folks

# For further information feel free to contact me: justin.salamon@nyu.edu (or justin.salamon@gmail.com)