This tutorial will guide you through some tools for performing spectral analysis and synthesis using the Essentia library (http://www.essentia.upf.edu). In this case we use a STFT analysis/synthesis workflow together with predominant pitch estimation with the goal to remove or soloing the predominant source. This algorithm uses a binary masking technique, modifying the magnitude values at the frequency bins in the spectrum that correspond to the harmonic series of the predominant pitch. It can be seen as a very primitive approach to 'source separation'.
You should first install the Essentia library with Python bindings. Installation instructions are detailed here: http://essentia.upf.edu/documentation/installing.html .
# import essentia in standard mode
import essentia
import essentia.standard
from essentia.standard import *
After importing Essentia library, let's import other numerical and plotting tools
# import matplotlib for plotting
import matplotlib.pyplot as plt
import numpy as np
Define the parameters of the STFT workflow
# algorithm parameters
framesize = 2048
hopsize = 128 # PredominantPitchMelodia requires a hopsize of 128
samplerate = 44100.0
attenuation_dB = 100
maskbinwidth = 2
Specify input and output audio filenames
inputFilename = 'flamenco.wav'
outputFilename = 'flamenco_stft.wav'
# create an audio loader and import audio file
loader = essentia.standard.MonoLoader(filename = inputFilename, sampleRate = 44100)
audio = loader()
print("Duration of the audio sample [sec]:")
print(len(audio)/44100.0)
Duration of the audio sample [sec]: 14.22859410430839
Define algorithm chain for frame-by-frame process: FrameCutter -> Windowing -> FFT -> IFFT OverlapAdd -> AudioWriter
Predominant pitch extraction
#extract predominant pitch
# PitchMelodia takes the entire audio signal as input - no frame-wise processing is required here.
pExt = PredominantPitchMelodia(frameSize = framesize, hopSize = hopsize, sampleRate = samplerate)
pitch, pitchConf = pExt(audio)
# algorithm workflow for harmonic mask using the STFT frame-by-frame
fcut = FrameCutter(frameSize = framesize, hopSize = hopsize);
w = Windowing(type = "hann");
fft = FFT(size = framesize);
hmask = HarmonicMask( sampleRate = samplerate, binWidth = maskbinwidth, attenuation = attenuation_dB);
ifft = IFFT(size = framesize);
overl = OverlapAdd (frameSize = framesize, hopSize = hopsize);
awrite = MonoWriter (filename = outputFilename, sampleRate = 44100);
Now we loop over all audio frames and store the processed audio sampels in the output array
audioout = np.array(0) # initialize output array
for idx, frame in enumerate(FrameGenerator(audio, frameSize = framesize, hopSize = hopsize)):
# STFT analysis
infft = fft(w(frame))
# get pitch of current frame
curpitch = pitch[idx]
# here we apply the harmonic mask spectral transformations
outfft = hmask(infft, pitch[idx]);
# STFT synthesis
out = overl(ifft(outfft))
audioout = np.append(audioout, out)
Finally we write the processed audio array as a WAV file
# write audio output
awrite(audioout.astype(np.float32))