Import the package and some utils functions
# the following line enable interact with figures,
# you can make zoom and save images from a poup matplotlib window
%matplotlib qt
#notebook qt ipympl tk qt
#with warnings.catch_warnings(): warnings.simplefilter('ignore')
import birdsongs as bs
from birdsongs.utils import *
Define a path object, it manages the folders directions of results, auxiliar data, audios, and birdsongs paths; and ploter object, to visualize syllables. The path object look for all the wav files in the audio file, located at root_path/Audios
.
You must fill the root path according where you clone the repositry and your Operative System.
# root_path = "path_to_repository\\'
# audios_path = "audios_path\\'
# bird_name = "Zonotrichia capensis"
# audios_path = "C:\\Users\\sebas\\Documents\\GitHub\\audios\\Dissertation-xeno\\"
paths = bs.Paths() # root_path, audios_path, bird_name
ploter = bs.Ploter(save=False) # to save figures save=True
The folder has 4 songs
paths.ShowFiles()
1-humman.wav 2-XC104508 - Ocellated Tapaculo - Acropternis orthonyx.wav 3-XC11293 - Rufous-collared Sparrow - Zonotrichia capensis.wav 4-XC513182 - Rufous-collared Sparrow - Zonotrichia capensis.wav
no_file = 2 # int(input("Enter the number of song (1 to {0}): ".format(paths.no_files)))
define the song by the file number
complet_bird = bs.Song(paths, no_file=no_file, Nt=5000,
flim=(1e3,20e3), split_method="amplitud", umbral=0.15)
ploter.Plot(complet_bird, FF_on=False)
AudioPlay(complet_bird)
The son has 38 syllables
clip_bird = bs.Song(paths, no_file=no_file, umbral_FF=1., split_method="freq",
flim=(1e3,20e3)) # , tlim=(12.5,14)
ploter.Plot(clip_bird, FF_on=True)
AudioPlay(clip_bird)
The son has 169 syllables
klicker = ploter.FindTimes(complet_bird, FF_on=False)
plt.show()
# time_intervals = Positions(klicker)
#time_intervals
time_intervals = np.array([[12.66456702, 12.97913565],
[12.99078634, 13.22380014],
[13.29952962, 13.53254342],#]) - 12.5#,,
[13.61701092, 13.81798533],
[14.1063399 , 14.36556776],
[18.5122157 , 18.81979551],
[18.93885867, 19.08603395],
[19.15218015, 19.31093102],
[19.38038453, 19.55567195]])
#[t[] for t in time_intervals]
np.int64(time_intervals*complet_bird.fs)
array([[558507, 572379], [572893, 583169], [586509, 596785], [600510, 609373], [622089, 633521], [816388, 829952], [835203, 841694], [844611, 851612], [854674, 862405]], dtype=int64)
define the song by a time interval, interval of interest from a complete song
Visualize the song with the ploter object use ploter function Plot and enter the object you want to visualize
The syllable extraction has three methods:
find where the normalized audio amplitud crosses an umbral, 0.05.
find where the fundamental frequency changes drastically, where the changes is more than 500 Hz.
use the maad segmentation tools (from scitik-maad) to find Region Of Interest (not implemented yet).
This is the kernel of the model, the Syllable object. This object extract the syllable tempo-spectral features and define the neccesary variables to implement and solve the Motor Gesture. It returns a synthetic syllable as syllable object.
To solve the syllable, find its synthetic syllable from some parameters ($\alpha, \beta, \gamma$), use its method Solve. This mehotd dependes of the model parameters $p$, defined in each syllable object, which are $\alpha_i, \beta_i, $ and $\gamma$. Although it is possible to define cutom parameters set, the syllable object has an initial set predefined (it is in the factible region).
To display the parametres set use the command Display(object.p)
. To change one of them values use syllable_object.p["a0"].set(value=value_a0)
.
Defining with the Song object. Here is used the bird syllables divider, it requires the syllable number of interest
%%time
no_syllable = 4 # int(input("Enter the syllable number (1 to {0}): ".format(bird.no_syllables)))
syllable = clip_bird.Syllable(no_syllable)
ploter.Plot(syllable)
AudioPlay(syllable)
define with the Syllable object, ir defines a syllable from an audio bird. You can optionally give a time range for clip the bird audio
syl_test = bs.Syllable(clip_bird, tlim=[1.131,1.188],
umbral_FF=1.2, Nt=30, NN=128)
ploter.Plot(syl_test)
AudioPlay(syl_test)
%%time
syllable_synth = syllable.Solve(syllable.p)
syl_test_synth = syl_test.Solve(syl_test.p)
# ploter.PlotVs(syllable_synth)
ploter.PlotAlphaBeta(syllable_synth)
ploter.Result(syllable, syllable_synth)
ploter.Plot(syllable_synth)
AudioPlay(syllable_synth)
# ploter.PlotVs(syl_test_synth)
ploter.PlotAlphaBeta(syl_test_synth)
ploter.Result(syl_test, syl_test_synth)
ploter.Plot(syl_test_synth)
AudioPlay(syl_test_synth)
Syllable variables are visualize with ploter object, here also is measured the time exectuion of the Motor Gesture definition and solution.
One of the biggest advantages of model implementation is the easily parameters exploration, as an example let's use the previous syllable defined but varying the input saic pressure in three levels: low, medium and high.
To plot each object, song or syllable, use the command ploter.Plot(obj)
.
There is also the posibility to define a syllable object from a time interval and the bird object, nevertheless, you have to define some attribues to the object in order to ploter works fine. Avoid to enter wrong time limits, the object will not be defined.
To improve the syllable frequency resolution, modify the Short Time Fourier Transform window length but remeber that better frequency resolution implies loss time resolution. This feature is useful when the syllable spectrum is "complex" (trilled syllables).
$ a_0 = 0.01 $
%%time
syllable.p["a0"].set(value=0.01)
syllable_synth = syllable.Solve(syllable.p)
# ploter.PlotVs(syllable_synth)
ploter.PlotAlphaBeta(syllable_synth)
ploter.Result(syllable, syllable_synth)
$ a_0 = 0.11 $
syllable.p["a0"].set(value=0.11)
syllable_synth = syllable.Solve(syllable.p)
ploter.PlotAlphaBeta(syllable_synth)
ploter.Result(syllable, syllable_synth)
$ a_0 = 1.25 $
%%time
syllable.p["a0"].set(value=1.25)
syllable_synth = sesyllable.Solve(syllable.p)
ploter.PlotAlphaBeta(syllable_synth)
ploter.Result(syllable, syllable_synth)
Although the sylalble division is a good approximation to solve the problem, a better methodology is take chuncks of a syllable, divide the syllable in fractions. Since the object syllable is already defined, is worth to use it again to define the chunck object.
The biggest difference with the syllable object is the Fourier Transform window length and envelope parameters. Since these chuncks are smaller than the syllabes the space parameters curves is also smaller.
Choose what fraction of the syllable are you interested, no_chunck
, and define it as a syllable object
no_chunck = 0 # int(input("Enter the number of song (1 to {0}): ".format(bird.no_chuncks)))
chunck = clip_bird.Chunck(no_chunck)
ploter.Plot(chunck)
AudioPlay(chunck)
Show the parameters used and solve generate the synthetic chunck
Display(chunck.p)
chunck_synth = chunck.Solve(chunck.p)
# ploter.PlotVs(chunck_synth)
ploter.Plot(chunck_synth)
AudioPlay(chunck_synth)
Visualize the chunck tempo-spectral features and the scored variables, scored defined to compare real and synthetic syllables
ploter.Syllables(chunck, chunck_synth)
ploter.PlotAlphaBeta(chunck_synth)
ploter.Result(chunck, chunck_synth)
AudioPlay(chunck_synth)
Visualize again the song but with the chunck and syllables objects also plotted. The syllable and chunck must be previously defined
ploter.Plot(clip_bird, FF_on=True, syllable_on=True, chunck_on=True)
with $\Omega_\gamma , \Omega_\alpha$, and $\Omega_\beta$ the known feasible regions for each variable. In order to get an objective function adimensionless the following two variables are define
$$ \hat{SCI} := \frac{SCI}{dim(SCI)} , \qquad \hat{FF} := \frac{1}{dim(FF)} \frac{FF}{1 \; KHz} $$where $dim()$ is the dimension the corresponding vector.
The general problem is computationally expensive since depends on many variables. Although solve the problem of one shot is the ideal method, a better approach is split the general problem in three auxiliar problems
The coefficients of $\alpha$ are calculated with the spectrum coeffcients correlation between the real syllable and the synthetic one
\begin{equation}\label{optimal_a_min} \begin{aligned} \underset{a \in \mathbb{R}^3}{\text{min}} & \qquad - corr (real, synthetic(a)) \\ \text { subject to } & a\in\Omega_a \end{aligned} \end{equation}The last step is to find the beta coefficients $b_i$
\begin{equation}\label{optimal_b} \begin{aligned} \underset{b \in \mathbb{R}^3}{\text{min}} &\qquad || FF_{real} - FF_{synt} (b)|| \\ \text { subject to } & \qquad \; b \in \Omega_b \end{aligned} \end{equation}with $t\in [0,T]$ where $T$ is the duration of the sillable (chunck).
The air-sac pressure and labial tension are defined by six coefficients, 3 coefficients each one, this means their time curves are parabolic functions (the motor gestures are parabolas) but this can be modify but ommiting the third coefficients $a_2, b_2$ and working with lines curves as a motor gestures.
Define the method and its parameters to solve the optimization problem
brute = {'method':'brute', 'Ns':11} #, 'workers':-1}
DualAnnealing = {'method':'dual_annealing','max_nfev':200, 'maxiter': 100}
Define the object to optimize and its corresponding optimizer
obj = syl_test # syllable # chunck
optimizer = bs.Optimizer(obj, method_kwargs=brute)
obj.id
ploter.Plot(obj)
AudioPlay(obj)
You can check all the methods availables uncommenting the following line, check the method attribute
#?lmfit.minimize
Show model parameters and plots of the inital synthetic syllable
Display(obj.p)
obj_synth = obj.Solve(obj.p)
ploter.PlotAlphaBeta(obj_synth)
ploter.Result(obj, obj_synth)
AudioPlay(obj_synth)
Gammas = optimizer.AllGammas(clip_bird)
#optimizer.OptimalGamma(syl_test_synth)
Altough the optimal parameters are stores in the optimizer, it is recommend to save this value in the object parameters set
obj.p = optimizer.obj.p
optimizer.optimal_gamma
Show parameter set, solve the object with this parameters, and visualize synthetic syllable
obj.id
Display(obj.p)
obj_synth = obj.Solve(obj.p)
ploter.PlotAlphaBeta(obj_synth)
ploter.Result(obj, obj_synth)
AudioPlay(obj_synth)
Dependeing of what gesture approximation are you interested (linear or quadratic curves for $\alpha$ or $\beta$), the optimizer object find the optimal parameteres using its OptimalVariable
method
# optimizer.OptimalAs(obj)
# optimizer.OptimalBs(obj)
optimizer.OptimalParams(obj, Ns=11)
Display(obj.p)
One shot solving is also implemented but the execution is very slow, since the parameter space has a dimension of at least 5 parameters.
# optimizer.OptimalParameters()
# Display(obj.p)
Finding optimal $\gamma$, $b_0$, and $b_1$ by the brute method
Solve and visualize the optimal synthetic syllable and its features
#Display(obj.p)
obj_synth_optimal = obj.Solve(obj.p)
ploter.Syllables(obj, obj_synth_optimal)
# ploter.PlotVs(obj_synth_optimal)
ploter.PlotAlphaBeta(obj_synth_optimal)
ploter.Result(obj, obj_synth_optimal)
AudioPlay(obj_synth_optimal)
The final step is write the audio. To export the syllable in audio format use the syllable method WriteAudio
obj.WriteAudio()
obj_synth.WriteAudio()
brute = {'method':'brute', 'Ns':11} #, 'workers':-1}
optimizer_bird = bs.Optimizer(clip_bird, method_kwargs=brute)
#optimizer_bird.AllGammasByTimes(times)
synth_bird = optimizer_bird.SongByTimes(time_intervals)
# # # plt.plot(complet_bird.time_s, optimizer_bird.synth_bird_s)
# # # plt.xlim((12, 14))
# #plt.plot(optimizer_bird.synth_bird_s)
# synth_bird = bs.Song(complet_bird.paths, complet_bird.no_file,
# sfs=[optimizer_bird.synth_bird_s, complet_bird.fs],
# split_method="amplitud", umbral=-1.01)
# # #optimizer_bird.obj0.id
ploter.Plot(clip_bird)
AudioPlay(clip_bird)
optimizer_bird.synth_bird.id
'song-synth'
#synth_bird.id = "song-synth"
ploter.Plot(optimizer_bird.synth_bird)
AudioPlay(optimizer_bird.synth_bird)
clip_bird.WriteAudio()
optimizer_bird.synth_bird.WriteAudio()
C:\Users\sebas\anaconda3\lib\site-packages\maad\sound\input_output.py:390: UserWarning: Values for bit depth should be 8, 16 or 32. Argument ignored. warn('Values for bit depth should be 8, 16 or 32. Argument ignored.')
# plt.plot(synth_bird.betas_bird)
# AudioPlay(synth_bird)
This function attends to solve all the song, it calculates the optimal gamma and find the optimal parameters for each syllable
# bird.WholeSong(brute, plot=True, syll_max=0)