This notebook illustrates the Discrete Fourier Transformation (DFT) for different time series.
The coefficients of the DFT are one of the many features in tsfresh.
If we see a series of length $n$ - without timestamps - we don't have any idea about the time domain: In what time was the series recorded? What is the timedelta between 2 observations? The only thing we know (or, more realistic, assume) is that the series is uniformly sampled through time. Of course, with the knowledge of the sampling frequency, one can draw conclusions about the real frequencies in the time domain. But tsfresh
does not incorporate this information.
Therefore, in Discrete Fourier Transform (DFT), "frequencies" are expressed in terms of "fractions of series length", $2\pi k/n$, $k=0, ..., n-1$. One can read it as "$2\pi k$ divided on $n$ points" or "$k$ periods of length $2\pi$ divided on $n$ points" or "$k$ cycles among $n$ points". At "time" $m \in \lbrace0, ..., n-1\rbrace$, the value $a_m$ is observed.
The definition used in the implementation in np.fft
is (picture from numpy docs):
with $a_m$ being the (possibly complex) value of the series at position $m$ and the series length $n$. $a_m$ is multiplied with the complex number on the angle $2\pi k m/n$ (in rad not deg measure) on the unit circle. The following picture shows how the complex exponential function rotates among the unit circle / shows "Euler's Theorem" (figure taken from Wikipedia):
The resulting coefficients $A_k$ can be drawn on a complex pane, too (figure from Wikipedia):
The absolute value $r$ of the complex DFT coefficient corresponds to the amplitude of the oscillation, and the angle $\phi$ corresponds to the phase.
It is stated in Wikipedia (Notation is modified):
It's amplitude and phase are
$|A_k|$ / $n $ and $arctan(Im(A_k), Re(A_k))$
Note the normalization of the Amplitude with the series length $n$.
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
import pandas as pd
from numpy.fft import rfft
sns.set_style("whitegrid")
sns.set_palette('colorblind')
def plot_x_and_rfft(x, fft_coeffs):
fft = fft_coeffs
plt.figure(figsize=(12, 4))
plt.subplot(211)
plt.title("Time Series")
plt.plot(x)
plt.subplot(212)
plt.plot(fft.real, label="real", marker="o")
plt.plot(fft.imag, label="imag", marker="o")
plt.plot(np.abs(fft), label="abs", marker="o")
plt.title("Real, imaginary and absolute part of DFT coefficients A(k)")
plt.legend()
plt.tight_layout()
plt.show()
Let's compose a series as the sum of some cosine waves with different frequencies.
n = 200 # no of samples
m = np.linspace(0, 1, n, endpoint=False) # the results differ with the defaul endpoint=True
x = np.zeros(n)
plt.figure(figsize=(20,20))
for i in range(15):
plt.subplot(5, 3, i+1)
# the results differ with the defaul endpoint=True
cos_wave = np.cos(np.linspace(0, i*2*np.pi, n, endpoint=False))
plt.title("Signal no {}: {} Cos-Waves".format(i,i))
plt.plot(cos_wave)
x = x + cos_wave
plt.show()
The analytical solution of the DFT is:
fft_coeffs = rfft(x)
amp_deg = tuple(zip(range(20), np.abs(fft_coeffs)[:20], np.angle(fft_coeffs, deg=True)[:20]))
print("No \t Abs.Value \t Abs.Value/N \t angle in deg")
for no, amp, deg in amp_deg:
print("{:d} \t {:f} \t {:f} \t {:4.2f}".format(no, amp, amp / n, deg))
No Abs.Value Abs.Value/N angle in deg 0 200.000000 1.000000 0.00 1 100.000000 0.500000 0.00 2 100.000000 0.500000 -0.00 3 100.000000 0.500000 -0.00 4 100.000000 0.500000 0.00 5 100.000000 0.500000 -0.00 6 100.000000 0.500000 -0.00 7 100.000000 0.500000 -0.00 8 100.000000 0.500000 0.00 9 100.000000 0.500000 -0.00 10 100.000000 0.500000 -0.00 11 100.000000 0.500000 -0.00 12 100.000000 0.500000 -0.00 13 100.000000 0.500000 0.00 14 100.000000 0.500000 -0.00 15 0.000000 0.000000 -50.74 16 0.000000 0.000000 -120.06 17 0.000000 0.000000 -47.79 18 0.000000 0.000000 -57.17 19 0.000000 0.000000 -4.00
If you use np.linspace
with endpoint=True
in the signal generation, you won't get the correct results.
The amplitude for the 0th Fourier coefficient - which corresponds to the constant 1 - is correct. But obviously, all the other amplitudes are wrong. Moreover, they are consistently half of the correct amplitude. Why is this the case?
I won't go into detail here, but it is because the output of the DFT - in case of a real valued elements $a_m$ - is symmetric. This means that an investigated "frequency" is splitted among two values of $k$, and so is the amplitude (not for $k=0$).
There is a difference in the implementations in np.fft
and scipy.fftpack
: NumPy provides the normalization flag, whereas fftpack
does not.
But be aware: The normalization does not yield true amplitudes, because the normalization is done by $1 / \sqrt{n}$. Correct amplitudes can be calculated by multiplication with $1/n$. However, this normalization is sufficient to compare timeseries with different length.
print("Unnormalized")
plot_x_and_rfft(x, rfft(x))
print("Normalized with 1 / sqrt(N)")
plot_x_and_rfft(x, rfft(x, norm="ortho"))
Unnormalized
Normalized with 1 / sqrt(N)
For correct resampling via irfft
to the time domain, the normalization flag in rfft
and irfft
must be equal (None or "ortho").
plt.plot(x, '--')
plt.plot(np.fft.irfft(rfft(x, norm="ortho"), norm="ortho"), ':')
plt.legend(['raw', 'resampled'])
plt.show()
What happens if one tries to resample the signal from the first 10 coefficients only and discard the rest? How does the plot look like?
fft_coeffs = rfft(x, norm="ortho")
trunc_fft_coeffs = np.ones(len(fft_coeffs)) * 0+0j
trunc_fft_coeffs[0:10] = fft_coeffs[0:10]
trunc_fft_coeffs[0:15] # the remaining entries are zero
array([ 14.14213562 +0.00000000e+00j, 7.07106781 +1.93774590e-15j, 7.07106781 -1.72812614e-15j, 7.07106781 -5.70583545e-15j, 7.07106781 +4.21786636e-15j, 7.07106781 -3.98762969e-15j, 7.07106781 -1.21015308e-14j, 7.07106781 -6.01337816e-15j, 7.07106781 +6.47656704e-15j, 7.07106781 -5.10788198e-15j, 0.00000000 +0.00000000e+00j, 0.00000000 +0.00000000e+00j, 0.00000000 +0.00000000e+00j, 0.00000000 +0.00000000e+00j, 0.00000000 +0.00000000e+00j])
plt.plot(np.fft.irfft(fft_coeffs, norm="ortho")) # plot resampled with all coefficients
plt.plot(np.fft.irfft(trunc_fft_coeffs, norm="ortho")) # plot resampled with truncated coefficients
plt.legend(["not truncated", "truncated"])
plt.show()
We see an information loss without these coefficients.
Truncate the time series from above. The number of cycles is not integer any more.
y = x[0:-50] # length is now 150, some periods are not integer any more
print("Unnormalized")
plot_x_and_rfft(y, rfft(y))
print("Normalized with 1 / sqrt(N)")
plot_x_and_rfft(y, rfft(y, norm="ortho"))
Unnormalized
Normalized with 1 / sqrt(N)
About the first 11 coefficients seem to be important. Plot the inverted coefficients / resampled signal.
plt.plot(y, '--')
plt.plot(np.fft.irfft(rfft(y, norm="ortho"), norm="ortho"), ':')
plt.legend(["raw", "resampled"])
plt.show()
We do not have an analytical description of the signal any more, but it seems that it is perfectly described by the DFT coefficients, as it can be resampled without an error.
noise = np.random.normal(loc=0.0, scale=1, size=200)
x_with_noise = x + noise
plt.plot(x_with_noise)
plt.show()
Whoosh, this signal seems corrupted. What are the results of the DFT?
print("Unnormalized")
plot_x_and_rfft(x_with_noise, rfft(x_with_noise))
print("Normalized with 1 / sqrt(N)")
plot_x_and_rfft(x_with_noise, rfft(x_with_noise, norm="ortho"))
Unnormalized
Normalized with 1 / sqrt(N)
With this small amount of noise, one can still see the importance of the first 15 DFT coefficients. With more noise (i.e. a larger stddev), the result gets, unsurprisingly, worse.
How is gaussian noise treated by the DFT?
noise = np.random.normal(loc=0.0, scale=1, size=200)
print("Unnormalized")
plot_x_and_rfft(noise, rfft(noise))
print("Normalized")
plot_x_and_rfft(noise, rfft(noise, norm="ortho"))
Unnormalized
Normalized
For a random noise, all coefficients seem to be relevant. There seems to be no pattern.
Discrete Fourier Transformation of an arbitrary polynomial series.
z = np.linspace(-1, 1, n, endpoint=False)
polynomial = z + z**2 - z**3 + z**4 - z**5 # some arbitrary composition
print("Unnormalized")
plot_x_and_rfft(polynomial, rfft(polynomial))
print("Normalized with 1 / sqrt(N)")
plot_x_and_rfft(polynomial, rfft(polynomial, norm="ortho"))
Unnormalized
Normalized with 1 / sqrt(N)
Discard all DFT coefficients with k >= 10 and k >= 5 and plot the resampling results.
fft_coeffs = rfft(polynomial, norm="ortho")
trunc_fft_coeffs_10 = np.ones(len(fft_coeffs)) * 0+0j
trunc_fft_coeffs_5 = np.ones(len(fft_coeffs)) * 0+0j
trunc_fft_coeffs_10[0:10] = fft_coeffs[0:10]
trunc_fft_coeffs_5[0:5] = fft_coeffs[0:5]
plt.plot(polynomial)
plt.plot(np.fft.irfft(rfft(polynomial, norm="ortho"), norm="ortho"))
plt.plot(np.fft.irfft(trunc_fft_coeffs_10, norm="ortho"))
plt.plot(np.fft.irfft(trunc_fft_coeffs_5, norm="ortho"))
plt.legend(["raw", "resampled_full", "resampled_k<10", "resampled_k<5"])
plt.show()