Plotting SHARP keywords and images with python

In this notebook, we will be plotting keywords and images, from data taken by the Helioseismic and Magnetic Imager (HMI) instrument on NASA's Solar Dynamics Observatory (SDO) satellite. SDO takes about a terabyte and a half of data a day, which is more data than any other satellite in the NASA Heliophysics Division.

Data from the HMI and Atmospheric Imaging Assembly (AIA) instruments aboard SDO are stored at Stanford University. The metadata are stored in a pSQL database called the Data Record Management System, or DRMS. The image data are stored separately, in storage units called the Storage Unit Management System, or SUMS. Data are merged together, upon export from both systems, as FITS files. DRMS and SUMS together constitute the Joint Science Operations Center, or JSOC.

The easiest way to access SDO HMI and AIA data is via the python drms module, available at PyPI. In addition to the numerous tutorials on both the Read the Docs and Github, all the examples below utilize the drms module. First we'll import the module, and some others:

In [1]:
import drms
import json, numpy as np, matplotlib.pylab as plt, matplotlib.ticker as mtick
from datetime import datetime as dt_obj
import urllib
from astropy.io import fits
from sunpy.visualization.colormaps import color_tables as ct
from matplotlib.dates import *
import matplotlib.image as mpimg
import sunpy.map
import sunpy.io
from IPython.display import Image
%matplotlib inline
%config InlineBackend.figure_format='retina'

The first step in querying for SDO HMI and AIA data is to establish a connection to JSOC. This can be done with the Client() class.

In [2]:
import drms
c = drms.Client()

The Client() class allows one to access both metadata and image data simultaneously via a data series. A data series contains all of particular type of data — e.g. there is a series for continuum intensity data, another for magnetic field data, and so forth. Read Section 4 of the SDO Analysis Guide for more information about how to build a data series query. For example, to find all the SHARP data series, execute the following regular expression query:

In [3]:
c.series(r'hmi\.sharp_')
Out[3]:
['hmi.sharp_720s',
 'hmi.sharp_720s_nrt',
 'hmi.sharp_cea_720s',
 'hmi.sharp_cea_720s_nrt']
In [4]:
# Set a series
si = c.info('hmi.sharp_cea_720s')

To find more information about the FITS header keywords that belong to any given series, we can use the following command:

In [5]:
si.keywords
Out[5]:
type recscope defval units note linkinfo is_time is_integer is_real is_numeric
name
cparms_sg000 string variable compress Rice none None False False False False
magnetogram_bzero double variable 0 none None False False True True
magnetogram_bscale double variable 0.1 none None False False True True
cparms_sg001 string variable none None False False False False
bitmap_bzero double variable 0 none None False False True True
... ... ... ... ... ... ... ... ... ... ...
ERRMSHA float variable nan Degrees Error in Mean shear angle for B_total None False False True True
ERRUSI float variable nan Amperes Error in Total unsigned vertical current None False False True True
DOFFSET int variable -2147483648 Gauss Constant value added to the noise mask for dis... None False True False True
ERRTPOT float variable nan Ergs per cubic centimeter Error in Total photospheric magnetic energy de... None False False True True
ERRJHT float variable nan Amperes Sum of the Absolute Value of the Net Currents ... None False False True True

211 rows × 10 columns

To find more information about the FITS image data, or segments, that belong to any given series, we can use the following command:

In [6]:
# To see all the segments associated with the series hmi.sharp_cea_720s:
si.segments
Out[6]:
type units protocol dims note
name
magnetogram int Gauss fits VARxVAR Line-of-sight magnetogram in CEA projection
bitmap char Enumerated fits VARxVAR Mask for the patch in CEA coordinates
Dopplergram int m/s fits VARxVAR Dopplergram in CEA projection
continuum int DN/s fits VARxVAR Intensitygram in CEA projection
Bp int Gauss fits VARxVAR B_phi, positive westward
Bt int Gauss fits VARxVAR B_theta, positive southward
Br int Gauss fits VARxVAR B_r, positive up
Bp_err int Gauss fits VARxVAR Standard deviation of B_phi
Bt_err int Gauss fits VARxVAR Standard deviation of B_theta
Br_err int Gauss fits VARxVAR Standard deviation of B_r
conf_disambig char none fits VARxVAR confidence of disambiguation result

Plotting the metadata

The query below retrieves both metadata and image data for active region 11158, which produced an X2.2-class flare on February 15, 2011 at 1:56 UT, from the SHARP data series. The SHARP data series include patches of vector magnetic field data taken by the HMI instrument. These patches encapsulate automatically-detected active regions that are tracked throughout the entirety of their disk passage. The c.query() method takes three arguments:

  1. The first argument is the data series. In the example below, the data series is called hmi.sharp_cea_720s. This series is appended with two prime keys: the HARP number (in this case, 377) and the time range (in this case, 2011.02.14_15:00:00/12h). These two prime keys appear in the first two brackets. The HARP number refers to the active region number (see here for a mapping between HARP numbers and NOAA active region numbers). A prime key, or set of prime keys, is a unique identifier. The third bracket, with the argument [? (QUALITY<65536) ?], filters out data where the value of the QUALITY keyword is greater than 65536. (See here for the definition of the QUALITY keyword). While this third bracket is not necessary, it can be a powerful tool for filtering data based on keyword values.
  2. The second argument in the search query is a list of keywords. In this example, we will query for the keywords T_REC, USFLUX, and ERRVF.
  3. The third argument in the search query is a list of segments. In this example, we will query for the segment Br, or the radial component of the photospheric magnetic field.
In [7]:
keys, segments = c.query('hmi.sharp_cea_720s[377][2011.02.14_15:00:00/12h][? (QUALITY<65536) ?]', key='T_REC, USFLUX, ERRVF', seg='Br')

To convert the keyword T_REC into a datetime object, we can use the function below.

In [8]:
def parse_tai_string(tstr,datetime=True):
    year   = int(tstr[:4])
    month  = int(tstr[5:7])
    day    = int(tstr[8:10])
    hour   = int(tstr[11:13])
    minute = int(tstr[14:16])
    if datetime: return dt_obj(year,month,day,hour,minute)
    else: return year,month,day,hour,minute
In [9]:
t_rec = np.array([parse_tai_string(keys.T_REC[i],datetime=True) for i in range(keys.T_REC.size)])

Now for some plotting! matplotlib.pyplot generates two objects: a figure and axes. The data are ascribed to the axes. The time axes in particular requires some formatting; in order to free it of clutter, we'll plot tick marks every three hours and label them accordingly.

In [10]:
fig, ax = plt.subplots(figsize=(8,7))      # define the size of the figure
orangered = (1.0,0.27,0,1.0)                # create an orange-red color

# define some style elements
marker_style = dict(linestyle='', markersize=8, fillstyle='full',color=orangered,markeredgecolor=orangered)
text_style = dict(fontsize=16, fontdict={'family': 'monospace'})
ax.tick_params(labelsize=14)
ax.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.2f'))

# ascribe the data to the axes
ax.plot(t_rec, (keys.USFLUX)/(1e22),'o',**marker_style)
ax.errorbar(t_rec, (keys.USFLUX)/(1e22), yerr=(keys.ERRVF)/(1e22), linestyle='',color=orangered)

# format the x-axis with universal time
locator = AutoDateLocator()
locator.intervald[HOURLY] = [3] # only show every 3 hours
formatter = DateFormatter('%H')
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)

# set yrange 
ax.set_ylim([2.4,2.8])

# label the axes and the plot
ax.set_xlabel('time in UT',**text_style)
ax.set_ylabel('maxwells x 1e22',**text_style)
ax.set_title('total unsigned flux starting at '+str(t_rec[0])+' UT',**text_style) # annotate the plot with a start time
Out[10]:
Text(0.5, 1.0, 'total unsigned flux starting at 2011-02-14 15:00:00 UT')

Querying the data

The example above shows a simple query for 12 hours of data from one HARPNUM. But we can also perform more complex queries to identify active regions of interest. Here are a few examples.

Example 1. Suppose we want to create a magnetic field model of an active region and we need observations of a strong-field active region near disk center. This query identifies strong-field regions near disk center during a two year period. We define strong active regions as those with a total unsigned flux (USFLUX) greater than $4x10^{22}$ Mx and near disk center as those with a Carrington Longitude (CRLN_OBS) less than $1^{\circ}$. The two year period spans between January 2014 and January 2016.

In [11]:
keys = c.query('hmi.sharp_cea_720s[][2014.01.01 - 2016.01.01][? (CRLN_OBS < 1) AND (USFLUX > 4e22) ?]', key='T_REC, HARPNUM, USFLUX, CRLT_OBS, CRLN_OBS, AREA_ACR')
In [12]:
keys
Out[12]:
T_REC HARPNUM USFLUX CRLT_OBS CRLN_OBS AREA_ACR
0 2014.05.04_14:48:00_TAI 4071 4.007906e+22 -3.818258 0.045204 1297.351685
1 2014.11.11_06:00:00_TAI 4800 4.004017e+22 3.297165 0.676486 1207.938232
2 2014.11.11_06:12:00_TAI 4800 4.023450e+22 3.295724 0.567313 1194.579590
3 2014.11.11_06:24:00_TAI 4800 4.050183e+22 3.294308 0.458156 1198.369873
4 2014.11.11_06:36:00_TAI 4800 4.070347e+22 3.292919 0.349014 1204.132935
5 2014.11.11_06:48:00_TAI 4800 4.081056e+22 3.291558 0.239885 1208.038818
6 2014.11.11_07:00:00_TAI 4800 4.094752e+22 3.290226 0.130767 1199.290039
7 2014.11.11_07:12:00_TAI 4800 4.113331e+22 3.288925 0.021658 1222.337891
8 2015.02.01_05:00:00_TAI 5127 4.801844e+22 -6.020030 0.985052 2273.071289
9 2015.02.01_05:12:00_TAI 5127 4.817782e+22 -6.020372 0.875986 2271.228516
10 2015.02.01_05:24:00_TAI 5127 4.778004e+22 -6.020685 0.766931 2270.977295
11 2015.02.01_05:36:00_TAI 5127 4.801163e+22 -6.020968 0.657885 2278.230469
12 2015.02.01_05:48:00_TAI 5127 4.780960e+22 -6.021223 0.548846 2281.161621
13 2015.02.01_06:00:00_TAI 5127 4.724961e+22 -6.021451 0.439813 2275.408203
14 2015.02.01_06:12:00_TAI 5127 4.682469e+22 -6.021652 0.330784 2282.527344
15 2015.02.01_06:24:00_TAI 5127 4.597347e+22 -6.021828 0.221756 2280.517334
16 2015.02.01_06:36:00_TAI 5127 4.505237e+22 -6.021980 0.112728 2281.434570
17 2015.02.01_06:48:00_TAI 5127 4.448969e+22 -6.022110 0.003698 2298.415039
18 2015.06.17_13:48:00_TAI 5692 4.593700e+22 1.266837 0.923733 1495.361328
19 2015.06.17_14:00:00_TAI 5692 4.592959e+22 1.267220 0.813259 1503.894775
20 2015.06.17_14:12:00_TAI 5692 4.593827e+22 1.267613 0.702742 1489.501465
21 2015.06.17_14:24:00_TAI 5692 4.584389e+22 1.268017 0.592184 1494.039429
22 2015.06.17_14:36:00_TAI 5692 4.636860e+22 1.268434 0.481585 1491.178833
23 2015.06.17_14:48:00_TAI 5692 4.616645e+22 1.268866 0.370945 1495.119507
24 2015.06.17_15:00:00_TAI 5692 4.629078e+22 1.269314 0.260266 1505.565796
25 2015.06.17_15:12:00_TAI 5692 4.652655e+22 1.269780 0.149549 1496.576538
26 2015.06.17_15:24:00_TAI 5692 4.681429e+22 1.270264 0.038794 1479.642456
27 2015.10.31_19:12:00_TAI 6063 5.331063e+22 4.459106 0.909237 1804.466309
28 2015.10.31_19:24:00_TAI 6063 5.314220e+22 4.458659 0.798615 1805.412476
29 2015.10.31_19:36:00_TAI 6063 5.258935e+22 4.458181 0.687989 1811.795410
30 2015.10.31_19:48:00_TAI 6063 5.197402e+22 4.457672 0.577359 1807.541748
31 2015.10.31_20:00:00_TAI 6063 5.231150e+22 4.457130 0.466728 1802.004395
32 2015.10.31_20:12:00_TAI 6063 5.238996e+22 4.456556 0.356098 1813.600952
33 2015.10.31_20:24:00_TAI 6063 5.218206e+22 4.455947 0.245472 1822.455444
34 2015.10.31_20:36:00_TAI 6063 5.241477e+22 4.455305 0.134850 1833.126953
35 2015.10.31_20:48:00_TAI 6063 5.301954e+22 4.454628 0.024236 1834.722900

Example 2. Suppose we are doing a study on flux emergence and we want to identify active regions that live for a long time. This query identifies long-lived active regions within a six year period. We define long-lived active regions as those with a minimum number of observations (N_PATCH1) equal to 1800 (1800 observations with a 12 minute gap between observations means a minimum observation time of 15 days). The six year period spans between January 2012 and January 2018. The T_FRST1=T_REC clause identifies the first observation time in the sequence.

In [13]:
keys = c.query('hmi.sharp_cea_720s[][2012.01.01 - 2018.01.01][? (N_PATCH1 > 1800) AND (T_FRST1=T_REC) ?]', key='T_REC, HARPNUM, NOAA_ARS, N_PATCH1, AREA_ACR')
In [14]:
keys
Out[14]:
T_REC HARPNUM NOAA_ARS N_PATCH1 AREA_ACR
0 2012.07.04_03:24:00_TAI 1834 11519,11520,11521 1820 28.082201
1 2012.07.22_21:12:00_TAI 1879 11529,11530,11532,11533,11536 1883 24.621628
2 2012.07.30_22:24:00_TAI 1907 11538,11539,11540,11541,11544,11545 1809 18.218309
3 2012.09.17_23:24:00_TAI 2040 11575,11577,11583 1986 124.659348
4 2013.04.26_15:36:00_TAI 2696 11732,11734 1846 18.187544
5 2013.06.12_17:24:00_TAI 2852 11769,11770,11771,11772,11774,11775 1864 3.143462
6 2013.12.30_21:00:00_TAI 3563 11943,11944 1822 2.208999
7 2014.01.17_09:00:00_TAI 3647 11958,11959,11960,11963,11964 1813 18.883802
8 2014.02.21_04:36:00_TAI 3784 11987,11989,11993,11994,12001 2018 30.495930
9 2014.03.14_21:24:00_TAI 3856 12008,12010,12012,12015,12019,12023 1838 21.218081
10 2014.04.10_23:00:00_TAI 4000 12035,12038,12043,12046 1878 14.411080
11 2014.07.27_01:00:00_TAI 4396 12127,12128,12130,12131,12132 1927 18.025908
12 2014.12.09_05:12:00_TAI 4920 12235,12237,12238,12242 1841 21.272736

Example 3. Suppose we are doing a study on flare prediction. Schrijver (2007) derived a parameter, called $R$, which quantifies the flux near an active region neutral line. The study concluded that an active region will flare if the log of R is greater than 5. Bobra & Couvidat (2015) also identified a large total unsigned helicity as a relevant characteristic of flaring active regions. This query identifies active regions with a log of R (R_VALUE) greater than 5.5 or a total unsigned helicity (TOTUSJH) greater than 8900 $\frac{G^{2}}{m}$ during a two year period between January 2012 and January 2014.

In [15]:
keys = c.query('hmi.sharp_cea_720s[][2012.01.01 - 2014.01.01][? (R_VALUE > 5.5 AND R_VALUE < 6.0) OR (TOTUSJH >= 8900)?]', key='T_REC,HARPNUM,R_VALUE,TOTUSJH')
In [16]:
keys
Out[16]:
T_REC HARPNUM R_VALUE TOTUSJH
0 2012.07.09_15:12:00_TAI 1834 5.249 8920.152
1 2012.07.09_15:36:00_TAI 1834 5.239 8954.800
2 2012.07.09_15:48:00_TAI 1834 5.253 8923.751
3 2012.07.09_16:00:00_TAI 1834 5.252 8902.096
4 2012.07.09_16:12:00_TAI 1834 5.251 8918.384
5 2012.07.09_16:24:00_TAI 1834 5.256 8901.315
6 2012.07.09_16:36:00_TAI 1834 5.261 8928.521
7 2012.07.09_21:12:00_TAI 1834 5.264 8907.350
8 2012.07.09_21:48:00_TAI 1834 5.271 8910.375
9 2012.07.09_22:12:00_TAI 1834 5.272 8931.835
10 2012.07.10_05:36:00_TAI 1834 5.255 8907.319
11 2012.07.10_05:48:00_TAI 1834 5.285 8903.378
12 2013.11.25_10:00:00_TAI 3376 5.530 55.039

Plotting the image data

We can also query for and plot image data. There are two ways to do this.

  1. We can download the image data, as unmerged FITS file, and header data separately. An unmerged FITS file contains the image data, but almost no header metadata (except for a few keywords). This is the quickest and easiest option as the drms.Client() class can query the header and image data at the same time and store the keyword metadata and URLs to the image data in a Pandas dataframe. This eliminates the need to store FITS files locally. This method is also faster, as there is no need to wait for the exportdata system to generate FITS files. We can then download and open the unmerged FITS files via the astropy package for FITS file handling.

  2. We can download the merged FITS file, which merges the header metadata and the image data together, and use SunPy Map to plot the image data. This is the easiest way to put the image data into a coordinate system, as the SunPy Map object will automatically use the WCS keyword data to plot the image data in the correct coordinate system. We can also read merged FITS via the astropy package for FITS file handling.

We go through each option below using an image of the radial component of the photospheric magnetic field as an example.

Option 1: Download the image data, as unmerged FITS file, and header data separately

Query the image data and header metadata using drms, then download and open the unmerged FITS file with astropy:

In [17]:
hmi_query_string = 'hmi.sharp_cea_720s[377][2011.02.15_02:12:00_TAI]'
keys, segments = c.query(hmi_query_string, key='T_REC, USFLUX, ERRVF', seg='Br')
url = 'http://jsoc.stanford.edu' + segments.Br[0]   # add the jsoc.stanford.edu suffix to the segment name
photosphere_image = fits.open(url)                  # download and open the unmerged FITS file via astropy

Plot the image data with matplotlib:

In [18]:
hmimag = plt.get_cmap('hmimag')
plt.imshow(photosphere_image[1].data,cmap=hmimag,origin='lower',vmin=-3000,vmax=3000)
print('The dimensions of this image are',photosphere_image[1].data.shape[0],'by',photosphere_image[1].data.shape[1],'.')
The dimensions of this image are 377 by 744 .

There are only a few keywords associated with the unmerged FITS file:

In [19]:
photosphere_image[1].header
Out[19]:
SIMPLE  =                    T / file does conform to FITS standard             
BITPIX  =                  -64 / data type of original image                    
NAXIS   =                    2 / dimension of original image                    
NAXIS1  =                  744 / length of original image axis                  
NAXIS2  =                  377 / length of original image axis                  
PCOUNT  =                    0 / size of special data area                      
GCOUNT  =                    1 / one data group (required keyword)              
XTENSION= 'BINTABLE'           / binary table extension                         
BLANK   =          -2147483648                                                  
CHECKSUM= 'VCJiX9GfVAGfV9Gf'   / HDU checksum updated 2018-05-10T01:45:55       
DATASUM = '1982616782'         / data unit checksum updated 2018-05-10T01:45:55 
                                                                                
                                                                                

But we can get all the header metadata information we like from the drms query:

In [20]:
keys
Out[20]:
T_REC USFLUX ERRVF
0 2011.02.15_02:12:00_TAI 2.653720e+22 6.506040e+18

Option 2: Download the merged FITS file and use SunPy Map to plot the image data

First we will download the FITS file from JSOC.

n.b. The code below will only work with a valid e-mail address. In order to obtain one, users must register on the JSOC exportdata website.

In [21]:
In [22]:
c = drms.Client(email=email, verbose=True)
In [23]:
# Export the magnetogram as a FITS image
r = c.export(hmi_query_string+'{Br}', protocol='fits', email=email)
fits_url_hmi = r.urls['url'][0]
hmi_map = sunpy.map.Map(fits_url_hmi)
Export request pending. [id=JSOC_20210406_1196, status=2]
Waiting for 5 seconds...
In [24]:
fig = plt.figure()
hmi_map.plot(cmap=hmimag, vmin=-3000,vmax=3000)
Out[24]:
<matplotlib.image.AxesImage at 0x15a6f5f40>

The image is now in the correct coordinate system! We can also inspect the map like this:

In [25]:
hmi_map
Out[25]:
<sunpy.map.sources.sdo.HMIMap object at 0x157cecb20>
Observatory SDO
Instrument HMI SIDE1
Detector HMI
Measurement hmi
Wavelength 6173.0 Angstrom
Observation Date 2011-02-15 02:10:12
Exposure Time 0.000000 s
Dimension [744. 377.] pix
Coordinate System heliographic_carrington
Scale [0.03 0.03] deg / pix
Reference Pixel [371.5 188. ] pix
Reference Coord [ 34.88091278 -21.07690048] deg
Image colormap uses histogram equalization
Click on the image to toggle between units