visualizing the transit of venus

In this notebook, we will be creating images to make a movie of the transit of Venus (below). While Venus transited across the Sun on June 5, 2012, it obscured some of the sunlight we observe here on Earth. This caused a dip in the brightness of the Sun. People who observe other stars look for a similar signal to indicate the presence of an extrasolar planet. Venus will transit across the Sun again in 2117.

In [1]:
%matplotlib inline
from IPython.display import IFrame

First, we'll import some modules.

In [2]:
import json, urllib, numpy as np, matplotlib.pylab as plt, matplotlib.ticker as mtick
import matplotlib.image as mpimg
import matplotlib.gridspec as gridspec
from datetime import datetime as dt_obj
from import fits
from matplotlib.dates import *
import matplotlib.gridspec as gridspec
from import color_tables as ct
from scipy import signal
%config InlineBackend.figure_format='retina'

Now we'll execute a sample JSON query using the json and urllib modules. In theory, it is possible to parse JSON queries using the read_json() function in the pandas library, but, in practice, the object returned by the JSON API for the SDO database, jsoc_info, isn't formatted in a way that pandas.read_json() can understand easily.

In [3]:
url = "[2012.06.03_22:00:59-2012.06.07_22:00:59]&op=rs_list&key=T_REC,DATAMEA2"
response = urllib.urlopen(url)
data = json.loads(

Now we'll create some empty lists to hold the data. The keyword DATAMEAN2 contains the average intensity of the Sun in DN/s.

In [4]:
datamea2 = [] # this holds the keyword DATAMEA2
t_rec  = [] # this holds the keyword T_REC

data is of type dict, so we'll get the number of keyword elements this way:

In [5]:
n_elements = len(data['keywords'][0]['values'])

And then we'll populate our empty lists to hold the keyword values. There are some missing values of DATAMEA2, so we can take the nearest available value (the cadence is short enough that this method is as good as interpolation):

In [6]:
count = 0.0
for i in range(n_elements):
    if 'MISSING' in str(data['keywords'][1]['values'][i]):
        print 'missing datamea2 value at time ',data['keywords'][0]['values'][i]
        datamea2_pre = data['keywords'][1]['values'][i-1]
        if (datamea2_pre != 'MISSING'):
            print 'substituting',data['keywords'][1]['values'][i-1]
            print 'substituting',data['keywords'][1]['values'][i-3]
        count = count + 1.0
datamea2 = np.array(datamea2).astype(float)
t_rec = np.array(t_rec)
print 'there are ',count,'missing datamea2 values'
print 'the length of datamea2 and t_rec are',datamea2.shape,t_rec.shape
missing datamea2 value at time  2012.06.06_17:33:00_TAI
substituting 48780.457031
missing datamea2 value at time  2012.06.06_17:42:00_TAI
substituting 48772.433594
missing datamea2 value at time  2012.06.06_17:48:45_TAI
substituting 48762.753906
missing datamea2 value at time  2012.06.06_18:04:30_TAI
substituting 48746.183594
missing datamea2 value at time  2012.06.06_18:05:15_TAI
substituting 48746.695312
missing datamea2 value at time  2012.06.06_18:06:00_TAI
substituting 48746.183594
missing datamea2 value at time  2012.06.07_17:33:00_TAI
substituting 48779.480469
there are  7.0 missing datamea2 values
the length of datamea2 and t_rec are (7681,) (7681,)

Now we have to detrend the value of DATAMEA2 to remove the effects of the orbital velocity of the spacecraft:

In [7]:
chunk = []
for i in range(1920, n_elements-1921, 1920):
    before_chunk = datamea2[i-1920:i]
    after_chunk = datamea2[i+1920:i+3840]
    avg_chunk = (before_chunk + after_chunk) / 2.0
    chunk.append(datamea2[i:i+1920] - avg_chunk)
    print 'chunk',i
chunk 1920
chunk 3840

Now make T_REC and DATAMEA2 variable the same size:

In [8]:
datamea2 = np.array(chunk).flatten()
t_rec = t_rec[1920:1920+1920+1920]
print datamea2.shape, t_rec.shape
(3840,) (3840,)

T_REC is formatted in a way that matplotlib.pyplot will not understand, so let's convert the numpy array into a datetime object:

In [9]:
def parse_tai_string(tstr,datetime=True):
    year   = int(tstr[:4])
    month  = int(tstr[5:7])
    day    = int(tstr[8:10])
    hour   = int(tstr[11:13])
    minute = int(tstr[14:16])
    second = int(tstr[17:19])
    if datetime: return dt_obj(year,month,day,hour,minute,second)
    else: return year,month,day,hour,minute,second

x = np.array([parse_tai_string(t_rec[i],datetime=True) for i in range(t_rec.size)])

These are the times and values we'll use for the visualization:

In [10]:
times_transit = x[1900:2456]
datamea2_transit = datamea2[1900:2456]

Now, a complication arises from the fact that the image data we have are not at the same cadence as the metadata. The imges are at a non-standard cadence to capture the ingress and egress of the planet; furthermore, the cadence is not constant throughout time. Therefore we have to first gather a list of times that correspond to each image and calculate the delta between each time step.

In [11]:
url = ''
response = urllib.urlopen(url)
times =
times = times.split('},') # then split it into lines
In [12]:
image_times = []
for i in range(0):
for i in range(1,11):
for i in range(12,101):
for i in range(102,183):
image_times = np.array(image_times)
In [13]:
image_times[0] = '2012-06-05_21:46:00'
image_times[1] = '2012-06-05_21:53:00'
In [14]:
x_image_times = np.array([parse_tai_string(image_times[i],datetime=True) for i in range(image_times.size)])
In [15]:
imagetimedelta = []
for i in range(x_image_times.size-1):
    imagetimedelta.append((x_image_times[i+1] - x_image_times[i]).total_seconds())
imagetimedelta = np.array(imagetimedelta)

Then we have to figure out which 45-second slot data to query for the corresponding metadata:

In [16]:
imagetimedelta = (np.round((imagetimedelta/45.+0.1))).astype(int)

Finally, we can generate the images. We have 179 time steps, unevenly spaced in time, that capture the transit of Venus. For each time step we'll show an image of the Sun and the detrended value of DATAMEAN2. Then we can stitch all these images together in a movie using Quicktime or a python movie-making tool. (I'm including the cell magic %%capture command here only to reduce the size of the notebook -- otherwise it would be too large to put on github).

In [17]:
count = 0
for i in range(imagetimedelta.shape[0]): 
    count = imagetimedelta[i] + count 
    j = i + 1
    counter = "%04d"%j
    if (counter == '0024'):   # this image is messed up
    url = ""+counter+".jpg"
    data = urllib.urlretrieve(url)
    image = mpimg.imread(data[0])

    fig = plt.figure(figsize=(10,10),facecolor='black')
    ax_image = plt.subplot2grid((7, 5), (0, 0), colspan=5, rowspan=5)

    #ax = fig.add_subplot(2,1,1)
    ax = plt.subplot2grid((7, 5), (5, 1), colspan=3, rowspan=2)
    #fig, ax = plt.subplots(figsize=(5,5))           # define the size of the figure
    cornblue = (100/255., 149/255., 147/255., 1.0)   # create a cornflower blue color
    grey     = (211/255., 211/255., 211/255., 1.0)   # create a light grey color

    # define some style elements
    marker_style = dict(linestyle='', markersize=8, fillstyle='full',color=cornblue,markeredgecolor=cornblue)
    background_marker_style = dict(linestyle='', markersize=8, fillstyle='full',color=grey,markeredgecolor=grey)
    text_style = dict(fontsize=16, fontdict={'family': 'helvetica'}, color=grey)
    # ascribe the data to the axes
    ax.plot(times_transit[:-1], datamea2_transit[:-1],'o',**background_marker_style)
    ax.plot(times_transit[0:count], datamea2_transit[0:count],'o',**marker_style)

    # format the x-axis with universal time
    locator = AutoDateLocator()
    locator.intervald[HOURLY] = [2] # only show every 6 hours
    formatter = DateFormatter('%H')

    # set the x and y axis ranges

    # label the axes and the plot
    fig.savefig(counter+'.jpg',bbox_inches='tight',facecolor=fig.get_facecolor(), transparent=True, dpi=192)