Important: Please read the installation page for details about how to install the toolboxes. $\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$ $\newcommand{\enscond}[2]{\lbrace #1, #2 \rbrace}$ $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\umax}[1]{\underset{#1}{\max}\;}$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\uargmin}[1]{\underset{#1}{argmin}\;}$ $\newcommand{\norm}[1]{\|#1\|}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\choice}[1]{ \left\{ \begin{array}{l} #1 \end{array} \right. }$ $\newcommand{\pa}[1]{\left(#1\right)}$ $\newcommand{\diag}[1]{{diag}\left( #1 \right)}$ $\newcommand{\qandq}{\quad\text{and}\quad}$ $\newcommand{\qwhereq}{\quad\text{where}\quad}$ $\newcommand{\qifq}{ \quad \text{if} \quad }$ $\newcommand{\qarrq}{ \quad \Longrightarrow \quad }$ $\newcommand{\ZZ}{\mathbb{Z}}$ $\newcommand{\CC}{\mathbb{C}}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\EE}{\mathbb{E}}$ $\newcommand{\Zz}{\mathcal{Z}}$ $\newcommand{\Ww}{\mathcal{W}}$ $\newcommand{\Vv}{\mathcal{V}}$ $\newcommand{\Nn}{\mathcal{N}}$ $\newcommand{\NN}{\mathcal{N}}$ $\newcommand{\Hh}{\mathcal{H}}$ $\newcommand{\Bb}{\mathcal{B}}$ $\newcommand{\Ee}{\mathcal{E}}$ $\newcommand{\Cc}{\mathcal{C}}$ $\newcommand{\Gg}{\mathcal{G}}$ $\newcommand{\Ss}{\mathcal{S}}$ $\newcommand{\Pp}{\mathcal{P}}$ $\newcommand{\Ff}{\mathcal{F}}$ $\newcommand{\Xx}{\mathcal{X}}$ $\newcommand{\Mm}{\mathcal{M}}$ $\newcommand{\Ii}{\mathcal{I}}$ $\newcommand{\Dd}{\mathcal{D}}$ $\newcommand{\Ll}{\mathcal{L}}$ $\newcommand{\Tt}{\mathcal{T}}$ $\newcommand{\si}{\sigma}$ $\newcommand{\al}{\alpha}$ $\newcommand{\la}{\lambda}$ $\newcommand{\ga}{\gamma}$ $\newcommand{\Ga}{\Gamma}$ $\newcommand{\La}{\Lambda}$ $\newcommand{\si}{\sigma}$ $\newcommand{\Si}{\Sigma}$ $\newcommand{\be}{\beta}$ $\newcommand{\de}{\delta}$ $\newcommand{\De}{\Delta}$ $\newcommand{\phi}{\varphi}$ $\newcommand{\th}{\theta}$ $\newcommand{\om}{\omega}$ $\newcommand{\Om}{\Omega}$
This numerical tour explore local Fourier analysis of sounds, and its application to source separation from stereo measurements.
from __future__ import division
import nt_toolbox as nt
from nt_solutions import audio_2_separation as solutions
%matplotlib inline
%load_ext autoreload
%autoreload 2
We load 3 sounds and simulate a stero recording by performing a linear blending of the sounds.
For Scilab users, it is safer to extend the stack size. For Matlab users this does nothing.
extend_stack_size(4)
Sound loading.
n = 1024*16
s = 3; % number of sound
p = 2; % number of micros
options.subsampling = 1
x = zeros(n, 3)
[x(: , 1), fs] = load_sound('bird', n, options)
[x(: , 2), fs] = load_sound('female', n, options)
[x(: , 3), fs] = load_sound('male', n, options)
normalize the energy of the signals
x = x./ repmat(std(x, 1), [n 1])
We mix the sound using a |2x3| transformation matrix. Here the direction are well-spaced, but you can try with more complicated mixing matrices.
compute the mixing matrix
theta = linspace(0, pi(), s + 1); theta(s + 1) = []
theta(1) = .2
M = [cos(theta); sin(theta)]
compute the mixed sources
y = x*M'
Display of the sounds and their mix.
for i in 1: s:
subplot(s, 1, i)
plot(x(: , i)); axis('tight')
set_graphic_sizes([], 20)
title(strcat('Source #', num2str(i)))
Display of the micro output.
for i in 1: p:
subplot(p, 1, i)
plot(y(: , i)); axis('tight')
set_graphic_sizes([], 20)
title(strcat('Micro #', num2str(i)))
In order to perform the separation, one performs a local Fourier analysis of the sound. The hope is that the sources will be well-separated over the Fourier domain because the sources are sparse after a STFT.
First set up parameters for the STFT.
options.n = n
w = 128; % size of the window
q = w/ 4; % overlap of the window
Compute the STFT of the sources.
clf; X = []; Y = []
for i in 1: s:
X(: , : , i) = perform_stft(x(: , i), w, q, options)
subplot(s, 1, i)
plot_spectrogram(X(: , : , i))
set_graphic_sizes([], 20)
title(strcat('Source #', num2str(i)))
Exercise 1
Compute the STFT of the micros, and store them into a matrix |Y|.
solutions.exo1()
## Insert your code here.
Since the sources are quite sparse over the Fourier plane, the directions are well estimated by looking as the direction emerging from a point clouds of the transformed coefficients.
First we compute the position of the point cloud.
mf = size(Y, 1)
mt = size(Y, 2)
P = reshape(Y, [mt*mf p])
P = [real(P); imag(P)]
Then we keep only the 5% of points with largest energy.
Display some points in the original (spacial) domain.
number of displayed points
npts = 6000
display original points
sel = randperm(n); sel = sel(1: npts)
plot(y(sel, 1), y(sel, 2), '.')
axis([-1 1 -1 1]*5)
set_graphic_sizes([], 20)
title('Time domain')
Exercise 2
Display some points of |P| in the transformed (time/frequency) domain.
solutions.exo2()
## Insert your code here.
We compute the angle associated to each point over the transformed domain. The histograms shows the main direction of mixing.
Theta = mod(atan2(P(: , 2), P(: , 1)), pi())
display histograms
nbins = 100
[h, t] = hist(Theta, nbins)
h = h/ sum(h)
bar(t, h); axis('tight')
Exercise 3
The histogram computed from the whole set of points are not peacked enough. To stabilize the detection of mixing direction, compute an histogram from a reduced set of point that have the largest amplitude. ompute the energy of each point xtract only a small sub-set ompute histogram isplay histograms
solutions.exo3()
## Insert your code here.
Exercise 4
Detect the direction |M1| approximating the true direction |M| by looking at the local maxima of the histogram. First detect the set of local maxima, and then keep only the three largest. ort in descending order
solutions.exo4()
## Insert your code here.
Once the mixing direction are known, one can project the sources on the direction.
We compute the projection of the coefficients Y on each estimated direction.
A = reshape(Y, [mt*mf p])
compute the projection of the coefficients on the directions
C = abs(M1'*A')
At each point |x|, the index |I(x)| is the direction which creates the largest projection.
I is the index of the closest source
[tmp, I] = compute_max(C, 1)
I = reshape(I, [mf mt])
An additional denoising is achieved by removing small coefficients.
do not take into account too small coefficients
T = .05
D = sqrt(sum(abs(Y).^2, 3))
I = I .* (D >T)
We can display the segmentation of the time frequency plane.
imageplot(I(1: mf/ 2, : ))
axis('normal')
set_colormap('jet')
The recovered coefficients are obtained by projection.
Proj = M1'*A'
Xr = []
for i in 1: s:
Xr(: , : , i) = reshape(Proj(i, : ), [mf mt]) .* (I = =i)
The estimated signals are obtained by inverting the STFT.
for i in 1: s:
xr(: , i) = perform_stft(Xr(: , : , i), w, q, options)
One can display the recovered signals.
for i in 1: s:
subplot(s, 1, i)
plot(xr(: , i)); axis('tight')
set_graphic_sizes([], 20)
title(strcat('Estimated source #', num2str(i)))
One can listen to the recovered sources.
i = 1
sound(x(: , i), fs)
sound(xr(: , i), fs)