This notebook provides an overview of how to do black hole spectroscopy with PyCBC Inference. We suggest that you go through the inference tutorial 0 for a better understanding of what's done here. For more details, see the separate tutorial notebooks.
To do a ringdown analysis in PyCBC we need the pykerr package installed. We'll also install dynesty
for doing the sampling. These are both optional dependencies in the PyCBC repository, meaning that we need to install them separately if we are installing released versions of PyCBC.
import sys
!{sys.executable} -m pip install pykerr dynesty pycbc --no-cache-dir
To do parameter estimation, we use the command-line script pycbc_inference
. This pulls together the sampler, models, and distributions modules in the pycbc.inference
package so that we may perform Bayesian inference. The standard command-line argument for PyCBC inference are:
pycbc_inference --verbose --seed ${SEED} --output-file inference.hdf --config-files config1.ini [config2.ini ...]
The key arguments that pycbc_inference
takes are an output file and one or more config-files
(run pycbc_inference --help
to see all of the arguments it takes). The config files entirely specify the problem that you will be analyzing. In them, you specify the model to use, the variable parameters, and the prior on each of the parameters. If the model involves gravitational-wave data, you also specify what data to load and how to estimate the PSD.
You use the same command-line program regardless of whether you are doing a standard CBC analysis, a BH ringdown analysis, or any other form of testing GR. The config file(s) you provide are what determines what type of analysis you are doing. Below, we will illustrated how to setup a config file to do a BH ringdown analysis on GW190521.
Note: in order to do everything from within the notebook, we'll create the config files by echoing a python string to a file. In a normal situation, you would just use your favorite text editor to create the config files.
As illustrated in previous notebooks, there are many different models in PyCBC. The model you choose, combined with the waveform template you set (done via the approximant
argument in the [static_params]
section), determines the type of analysis you are doing. Let's recall what models are available:
from pycbc.inference import models
for model_name in models.models:
print(model_name)
To do a BH ringdown analysis we need to use one of the gated Gaussian models. Note that there are two: the standard gated_gaussian_noise
model, and a gated_gaussian_margpol
. The latter does the same thing, it just marginalizes over the polarization angle numerically to speed up convergence. We'll use that model here.
model_config = """
[model]
name = gated_gaussian_margpol
low-frequency-cutoff = 15
"""
!echo '{model_config}' > model.ini
!cat model.ini
Since this model uses data, we also need to create a data section that specifies what data to read for the analysis and the PSD estimation. See the PyCBC Inference docs for more information about what options are available and what they mean.
First we need to download the data from GWOSC (note: if you were doing this on the command line, you can just use wget
or curl
):
from astropy.utils.data import download_file
ifos = ['H1', 'L1', 'V1']
data_filenames = {}
for ifo in ifos:
print("Downloading {} data".format(ifo))
# Download the gravitational wave data for GW190521
url = "https://www.gw-openscience.org/eventapi/html/O3_Discovery_Papers/GW190521/v2/{}-{}1_GWOSC_4KHZ_R2-1242440920-4096.gwf"
fname = download_file(url.format(ifo[0], ifo[0]), cache=True)
data_filenames[ifo] = fname
Now let's set up our data section, pointing to the frame files we just downloaded:
data_config = """
[data]
instruments = H1 L1 V1
trigger-time = 1242442967.445
analysis-start-time = -4
analysis-end-time = 4
data-conditioning-low-freq = H1:0 L1:0 V1:0
psd-estimation = median-mean
psd-start-time = -144
psd-end-time = 144
psd-inverse-length = 8
psd-segment-length = 8
psd-segment-stride = 4
frame-files = H1:{h1file} L1:{l1file} V1:{v1file}
channel-name = H1:GWOSC-4KHZ_R2_STRAIN L1:GWOSC-4KHZ_R2_STRAIN V1:GWOSC-4KHZ_R2_STRAIN
sample-rate = 2048
strain-high-pass = 10
pad-data = 8
""".format(h1file=data_filenames['H1'],
l1file=data_filenames['L1'],
v1file=data_filenames['V1'])
!echo '{data_config}' > data.ini
!cat data.ini
Now let's specify the prior. We need to provide a section that lists the variable parameters and another that specifies the static parameters. For every variable parameter we have to provide one or more prior
section(s) that specifies the prior distribution.
For a ringdown, the key things to note are:
TdQNMfromFinalMassspin
accepts delta_flmn
and delta_taulmn
parameters that can be used to make the frequency and damping time of an lmn
mode to deviate from its GR value. See the documentation TdQNMfromFinalMassSpin for a full list of documented parameters.harmonic
argument; here, we are using the spheroidal harmonics (which come from pykerr
).lmns
argument. Note that the last digit in the lmns
is not the overtone number, but the number of overtones to include for the given lm
mode. In other words, lmns = 222
means to include the 220 and 221 mode. If we wanted the 220 only, we'd set lmns = 221
.221 331 321
would mean to include the 220, 330, and 320 modes.lmns
argument we need to provide an amplitude and phase for that mode (as well as any other optional argument, such as a frequency/damping time deviation and/or a different polarization; see the TdQNMfromFinalMassSpin documentation for details). These additional parameters can either be fixed (by setting it in the static_params
) or variable.amp220
) is in terms of strain. The amplitudes of all other modes are specified with respect to the dominant mode. For example, amp221
means that the amplitude of the 221 mode (in terms of strain) will be amp221 * amp220
.prior_config = """
[static_params]
approximant = TdQNMfromFinalMassSpin
harmonics = spheroidal
tref = ${data|trigger-time}
ra = 3.5
dec = 0.73
toffset = 0.018
lmns = 222
t_final = 2
[variable_params]
final_mass =
final_spin =
inclination =
logamp220 =
phi220 =
amp221 =
phi221 =
[waveform_transforms-t_gate_end]
name = custom
inputs = tref
t_gate_end = tref + 0.0
[waveform_transforms-t_gate_start]
name = custom
inputs = t_gate_end
t_gate_start = t_gate_end - 2
[waveform_transforms-tc]
name = custom
inputs = t_gate_end
tc = t_gate_end - 0.001
[prior-final_mass]
name = uniform
min-final_mass = 100
max-final_mass = 500
[prior-final_spin]
name = uniform
min-final_spin = -0.99
max-final_spin = 0.99
[prior-inclination]
name = sin_angle
[prior-logamp220]
name = uniform
min-logamp220 = -24
max-logamp220 = -19
[waveform_transforms-amp220]
name = custom
inputs = logamp220
amp220 = 10**logamp220
[prior-phi220]
name = uniform_angle
[prior-amp221]
name = uniform
min-amp221 = 0
max-amp221 = 5
[prior-phi221]
name = uniform_angle
"""
!echo '{prior_config}' > prior.ini
!cat prior.ini
Finally, we need to specify the sampler settings. To do that, we need to provide a sampler
section. In this case we'll use the nested sampler dynesty
, but you can use any of the samplers that PyCBC has support for. See the PyCBC Inference docs for a list of supported samplers and the optional available for them.
Note: Below we set the number of livepoints to 100, the checkpoint interval to only 30s, and we set a maxcall
to 100. We are only doing this so that we can trigger a checkpoint quickly. This is only for demonstration purposes. In a real analysis, you would want to use many more livepoints (4000 is a good number for BH ringdown) and set the checkpoint interval to something ~1800s (or more). You also don't normally need to set maxcall
.
sampler_config = """
[sampler]
name = dynesty
dlogz = 0.1
nlive = 100
checkpoint_time_interval = 30
maxcall = 100
"""
!echo '{sampler_config}' > sampler.ini
!cat sampler.ini
pycbc_inference
¶Now that we've set up our config files, we can run pycbc_inference
. The BH ringdown problem we set up above would take a long time (several hours) to run in a notebook like this. Instead, start this running, then interrupt the kernel after you see the output message say that it has checkpointed a couple times (should occur within a couple minutes).
# generate a random int for the seed
import numpy
seed = numpy.random.randint(1,int(2**31))
print("Using seed: {}".format(seed))
!timeout 300 \
pycbc_inference --verbose \
--config-files model.ini data.ini prior.ini sampler.ini \
--output-file inference.hdf \
--seed {seed} \
--nprocesses 8 \
--force
We can see that a checkpoint file exists in our directory:
!ls -lh inference.hdf.checkpoint
pycbc_inference_plot_posterior
¶The command-line tool pycbc_inference_plot_posterior
is used to make corner and posterior plots of our results. We can give it either the direct output of pycbc_inference
(here, inference.hdf
) or a posterior file created using pycbc_inference_extract_samples
. There are several plotting configuration options available (see pycbc_inference_plot_posterior --help
for the list of options).
We can also plot the current status of the sampler using the checkpoint file. This isn't too meaningful for nested samplers, but can give you an idea of how far along the sampler is. Let's take a look at the raw samples from our current checkpoint file.
!pycbc_inference_plot_posterior --verbose \
--input-file inference.hdf.checkpoint \
--output-file raw_samples_current.png \
--plot-scatter --plot-marginal \
--z-arg loglikelihood \
--raw-samples
from IPython.display import Image
Image('raw_samples_current.png', height=480)
The setup we've done above was used in Capano et al. arXiv:2105.05238, the samples for which were released here. We'll download the completed posterior file of the 220+221 result at $t_{\rm ref} - 7\textrm{ms}$ from there to see what a completed result looks like.
!wget https://github.com/gwastro/BH-Spectroscopy-GW190521/raw/main/posteriors/kerr/220_221/KERR-220_221-M05MS.hdf
!pycbc_inference_plot_posterior --verbose \
--input-file KERR-220_221-M05MS.hdf \
--output-file completed_scatter.png \
--plot-scatter --plot-marginal \
--z-arg loglikelihood
Image('completed_scatter.png', height=480)
Let's make a more interesting plot: we'll use these to make a density plot of the mass, spin, and amplitudes, and give them pretty labels:
!pycbc_inference_plot_posterior --verbose \
--input-file KERR-220_221-M05MS.hdf \
--output-file completed_density-mass_spin_amps.png \
--plot-density --plot-contours --plot-marginal \
--density-cmap inferno --contour-colors white \
--max-kde-samples 5000 \
--parameters \
'final_mass:$M_f (M_\odot)$' \
'final_spin:$\chi_f$' \
'10**logamp220*1e20:$A_{220} (\times 10^20)$' \
'amp221:$A_{221}/A_{220}$'
Image('completed_density-mass_spin_amps.png', height=480)
The inference file and/or posterior file contains all the information to reconstruct the model; i.e., the config file, data, and PSDs are all stored in the hdf file. We can use this to load the model in the file, and plot the max likelihood waveforms.
from pycbc.inference import io, models
# load the posterior file
fp = io.loadfile('KERR-220_221-M05MS.hdf', 'r')
# get the config, the data, and PSDs from the file
# the config file:
cp = fp.read_config_file()
# the data
data = fp.read_data()
# the psds
psds = fp.read_psds()
# now let's load the model
model = models.read_from_config(cp, data=data, psds=psds)
# let's get the maximum likelihood point
samples = fp.read_samples(list(fp['samples'].keys()))
maxlidx = samples['loglikelihood'].argmax()
maxlparams = {p: samples[p][maxlidx] for p in model.variable_params}
# get the loglikelihood of these points
model.update(**maxlparams)
model.loglikelihood
# get the matched-filter SNR
print((2*model.loglr)**0.5)
# get the gated, whitened maxL waveform and plot it against
# the whitened data
gated_wfs = model.get_gated_waveforms()
gated_data = model.get_gated_data()
# whiten them
gated_wfs = model.whiten(gated_wfs, 1)
gated_data = model.whiten(gated_data, 1)
# convert to the time domain
gated_wfs = {ifo: d.to_timeseries() for ifo, d in gated_wfs.items()}
gated_data = {ifo: d.to_timeseries() for ifo, d in gated_data.items()}
# plot
%matplotlib notebook
from matplotlib import pyplot
fig, axes = pyplot.subplots(nrows=3, figsize=(8,8))
for ii, ifo in enumerate(gated_wfs):
ax = axes[ii]
# we'll plot w.r.t. the geocentric end gate time
t_gate_end = model.current_params['t_gate_end']
d = gated_data[ifo].time_slice(t_gate_end-0.15, t_gate_end+0.15)
t = d.sample_times - t_gate_end
ax.plot(t, d, color='black', label='{} gated data'.format(ifo))
# plot the waveform
d = gated_wfs[ifo].time_slice(t_gate_end-0.15, t_gate_end+0.15)
t = d.sample_times - t_gate_end
ax.plot(t, d, color='C1', lw=3, label='maxL gated template')
ax.legend()
if ii == 2:
ax.set_xlabel(r'time - $t_{\rm{gate\,end}}$ (s)', fontsize=18)
if ii == 1:
ax.set_ylabel('whitened data', fontsize=18)
fig.show()