Notebook

Tutorial Step 4: Working in the frequency domain¶

Working in the time domain is often not convenient so it's important to be familiar with the frequency domain. The frequency domain is a representation of time-series which represents how the signal is distributed within different frequency bands over a range of frequencies (so called the "spectrum" of a signal). Working in the frequency domain is more convenient for common operations such as filtering or signal identification. It's possible to transform a signal from the time domain into the frequency domain and vice-versa.

⚠ Warning

Uncomment the following cell if running in Google Colab. If you use other running methods, it should not be necessary.

In [1]:

#! wget http://gwosc.org/archive/data/O3b_4KHZ_R1/1263534080/H-H1_GWOSC_O3b_4KHZ_R1-1264312320-4096.hdf5

Before going further, let's go back to our previous example (there is nothing new in the code below):

In [2]:

#----------------------
# Import needed modules
#----------------------
import numpy as np
import matplotlib.pylab as plt
import matplotlib.mlab as mlab
import h5py

#----------
# Open file
#----------
fileName = 'H-H1_GWOSC_O3b_4KHZ_R1-1264312320-4096.hdf5'
dataFile = h5py.File(fileName, 'r')

#-------------------------------------
# Read the strain and some information
#-------------------------------------
strain   = dataFile['strain']['Strain']
ts       = dataFile['strain']['Strain'].attrs['Xspacing']
meta     = dataFile['meta']
gpsStart = meta['GPSstart'][()]
duration = meta['Duration'][()]
gpsEnd   = gpsStart + duration
time     = np.arange(gpsStart, gpsEnd, ts)

#-----------------------------
# Extract a part of the signal
#-----------------------------
length     = 16     # Length in second
startTime  = 1264316116.0
numsamples = int(length / ts)
startIndex = np.min(np.nonzero(startTime < time))
time_seg   = time[startIndex:(startIndex+numsamples)]
strain_seg = strain[startIndex:(startIndex+numsamples)]

An important concept when working in frequency domain, is the notion of sampling frequency. This is simply the inverse of the time between 2 samples. We will store this in a variable called fs.

In [3]:

# Sampling frequency
fs = int(1.0 / ts)

FFT with Blackman Window¶

A Fast Fourier Transform, or FFT, is one method to transform the data into the frequency domain. An important concept when computing the FFT is windowing. This refers to using a function (called the window function) to smoothly force the data to zero at the time boundaries, and so reduce edge effects in the FFT. Different window functions have different properties. Here, we will use the Blackman Window.

The noise spectrum of current instruments is seen to have the most power at low frequencies ($f < 50 Hz$), due mainly to seismic noise. The most sensitive band, often called "The Bucket", is roughly 100 - 300 Hz.

In [4]:

#------------------------------------------
# Apply a Blackman Window, and plot the FFT
#------------------------------------------
window          = np.blackman(strain_seg.size)
windowed_strain = strain_seg * window
freq_domain     = np.fft.rfft(windowed_strain) / fs
freq            = np.fft.rfftfreq(len(windowed_strain)) * fs    
plt.loglog(freq, abs(freq_domain))
plt.axis([10, fs/2.0, 1e-25, 1e-19])
plt.grid('on')
plt.xlabel('Freq (Hz)')
plt.ylabel('Strain / Hz')

Out[4]:

Text(0, 0.5, 'Strain / Hz')

Power Spectral Density (PSD) and Amplitude Spectral Density (ASD)¶

The Power Spectral Density is another representation of how the power in the data is distributed in frequency space. Matplotlib's psd method uses Welch's method to estimate the PSD. The method divides the data into segments with NFFT samples, computes a periodogram of each segment (similar to the FFT plotted above), and then takes the time average of the periodograms. This averaging over several segments reduces the variance in the result, which is why the PSD looks "thinner" than the FFT.

In [5]:

#-----------
# Make a PSD
#-----------
Pxx, freqs = mlab.psd(strain_seg, Fs=fs, NFFT=fs)
plt.loglog(freqs, Pxx)
plt.axis([10, 2000, 1e-50, 1e-38])
plt.grid('on')
plt.xlabel('Freq (Hz)')
plt.ylabel('PSD')

Out[5]:

Text(0, 0.5, 'PSD')

A variation is to use the square root of the PSD. This is is called the Amplitude Spectral Density (ASD) and is done to give units that can be more easily compared with the time domain data or FFT. More information about the ASD of the instruments can be seen on the page of each dataset, generally in a section called "Instrumental Spectral Lines" (see here for O3b).

In [6]:

#-------------
# Plot the ASD
#-------------
plt.loglog(freqs, np.sqrt(Pxx))
plt.axis([10, 2000, 1e-25, 1e-19])
plt.grid('on')
plt.xlabel('Freq (Hz)')
plt.ylabel('ASD (Strain / Hz$^{1/2})$')

Out[6]:

Text(0, 0.5, 'ASD (Strain / Hz$^{1/2})$')

Spectrogram¶

Working in the frequency domain is important but we lose all notion of when an event can occur. Intuitively, we would like to study how the distribution of frequency varies across time. Spectrograms are a way to show this.

In [7]:

#-------------------
# Make a spectrogram
#-------------------
NFFT = 1024
short_window = np.blackman(NFFT)
spec_power, freqs, bins, im = plt.specgram(
    strain_seg, NFFT=NFFT, Fs=fs,
    window=short_window
)
plt.xlabel('Time (s)')
plt.ylabel('Freq (Hz)')

Out[7]:

Text(0, 0.5, 'Freq (Hz)')

Mainly, this spectrogram shows what we already knew from the PSD: there is a lot more power at very low frequencies than high frequencies.

What's next?¶

A lot of methods used in the LVK pipelines are based on the frequency domain so it's important to be familiar about those ideas.

In the [next step](<05 - GWpy Examples.ipynb>), you will learn to use a python package to ease interaction with GWOSC data.