This tutorial will show you how to programmatically download the SETI code challenge data to your local file space and start to analyze it.
Please see the Step_1_Get_Data.ipynb notebook on information about all of the data available for this code challenge.
This tutorial will use the basic
data set, but will work, of course, with any of the data sets.
#The ibmseti package contains some useful tools to faciliate reading the data.
#Unfortunately, you must be using Python 2.7. The `ibmset` package does not yet work on Python 3.
#uncomment the line below to install the package
#!pip install --user --upgrade ibmseti
#If running on DSX, YOU WILL NEED TO RESTART YOUR DSX SPARK KERNEL. Click Kernel -> Restart above!
You'll notice that this tutorial doesn't use parallelization with Spark. This is to keep this simple and make this code generalizable to folks that are running this analysis on their local machines.
import ibmseti
import os
import zipfile
!ls my_data_folder/basic4
basic4.zip
zz = zipfile.ZipFile(mydatafolder + '/' + 'basic4.zip')
basic4list = zz.namelist()
firstfile = basic4list[0]
print firstfile
000919a5-bc7f-471e-959c-81adba0b1f36.dat
ibmseti
for convenience¶While it's somewhat trivial to read these data, the ibmseti.compamp.SimCompamp
class will extract the JSON header and the complex-value time-series data for you.
import ibmseti
aca = ibmseti.compamp.SimCompamp(zz.open(firstfile).read())
# This data file is classified as a 'squiggle'
aca.header()
{u'signal_classification': u'squiggle', u'uuid': u'000919a5-bc7f-471e-959c-81adba0b1f36'}
The goal is to take each simulation data file and
There are multiple ways to improve your model's ability to classify signals. You can
Here we just show how to view the data as a spectrogram
ibmseti
¶%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
## ibmseti.compamp.SimCompamp has a method to calculate the spectrogram for you (without any signal processing applied to the time-series data)
spectrogram = aca.get_spectrogram()
fig, ax = plt.subplots(figsize=(10, 5))
ax.imshow(np.log(spectrogram), aspect = 0.5*float(spectrogram.shape[1]) / spectrogram.shape[0])
<matplotlib.image.AxesImage at 0x7fc9221b3050>
You don't need to use ibmseti
python package to calculate the spectrogram for you.
This is especially important if you want to apply some signals processing to the time-series data before you create your spectrogram
complex_data = aca.complex_data()
#complex valued time-series
complex_data
array([-12.+19.j, -34. -4.j, -13. -8.j, ..., 11. +4.j, 2.+14.j, 16. -8.j], dtype=complex64)
complex_data = complex_data.reshape(32, 6144)
complex_data
array([[-12.+19.j, -34. -4.j, -13. -8.j, ..., 1. +1.j, -4.+17.j, 11.-13.j], [ -4. +7.j, 1.+43.j, 13. +4.j, ..., -21.-18.j, -2. +2.j, -9. +5.j], [ -5.-10.j, 4. +2.j, -16. -8.j, ..., 16. -1.j, 1.+24.j, -9. +7.j], ..., [-14. -7.j, -4.+10.j, 23.+13.j, ..., -20. +1.j, -15. -9.j, -2. -6.j], [ 3. -2.j, -12. -6.j, -3.-14.j, ..., 5.-10.j, 7.-35.j, 9.-17.j], [ 1. -8.j, 27.-19.j, -3.+13.j, ..., 11. +4.j, 2.+14.j, 16. -8.j]], dtype=complex64)
#Apply a Hanning Window
complex_data = complex_data * np.hanning(complex_data.shape[1])
complex_data
array([[ -0.00000000e+00 +0.00000000e+00j, -8.89237141e-06 -1.04616134e-06j, -1.36000939e-05 -8.36928855e-06j, ..., 1.04616107e-06 +1.04616107e-06j, -1.04616134e-06 +4.44618571e-06j, 0.00000000e+00 +0.00000000e+00j], [ -0.00000000e+00 +0.00000000e+00j, 2.61540336e-07 +1.12462344e-05j, 1.36000939e-05 +4.18464428e-06j, ..., -2.19693825e-05 -1.88308992e-05j, -5.23080671e-07 +5.23080671e-07j, -0.00000000e+00 +0.00000000e+00j], [ 0.00000000e+00 -0.00000000e+00j, 1.04616134e-06 +5.23080671e-07j, -1.67385771e-05 -8.36928855e-06j, ..., 1.67385771e-05 -1.04616107e-06j, 2.61540336e-07 +6.27696806e-06j, -0.00000000e+00 +0.00000000e+00j], ..., [ 0.00000000e+00 -0.00000000e+00j, -1.04616134e-06 +2.61540336e-06j, 2.40617046e-05 +1.36000939e-05j, ..., -2.09232214e-05 +1.04616107e-06j, -3.92310504e-06 -2.35386302e-06j, 0.00000000e+00 -0.00000000e+00j], [ 0.00000000e+00 +0.00000000e+00j, -3.13848403e-06 -1.56924201e-06j, -3.13848321e-06 -1.46462550e-05j, ..., 5.23080535e-06 -1.04616107e-05j, 1.83078235e-06 -9.15391175e-06j, 0.00000000e+00 +0.00000000e+00j], [ 0.00000000e+00 +0.00000000e+00j, 7.06158906e-06 -4.96926638e-06j, -3.13848321e-06 +1.36000939e-05j, ..., 1.15077718e-05 +4.18464428e-06j, 5.23080671e-07 +3.66156470e-06j, 0.00000000e+00 +0.00000000e+00j]])
# Build Spectogram & Plot
cpfft = np.fft.fftshift( np.fft.fft(complex_data), 1)
spectrogram = np.abs(cpfft)**2
fig, ax = plt.subplots(figsize=(10, 5))
ax.imshow(np.log(spectrogram), aspect = 0.5*float(spectrogram.shape[1]) / spectrogram.shape[0])
<matplotlib.image.AxesImage at 0x7fc92217d090>
Maybe try a different windowing? Or different method for calculating the spectrogram (see Welch's periodigram? Ask a SETI Researcher?)