In this notebook, we will be plotting keywords and images, from data taken by the Helioseismic and Magnetic Imager (HMI) instrument on NASA's Solar Dynamics Observatory (SDO) satellite. SDO takes about a terabyte and a half of data a day, which is more data than any other satellite in the NASA Heliophysics Division.
Data from the HMI and Atmospheric Imaging Assembly (AIA) instruments aboard SDO are stored at Stanford University. The metadata are stored in a pSQL database called the Data Record Management System, or DRMS. The image data are stored separately, in storage units called the Storage Unit Management System, or SUMS. Data are merged together, upon export from both systems, as FITS files. DRMS and SUMS together constitute the Joint Science Operations Center, or JSOC.
The easiest way to access SDO HMI and AIA data is via the python drms
module, available at PyPI. In addition to the numerous tutorials on both the Read the Docs and Github, all the examples below utilize the drms
module. First we'll import the module, and some others:
import drms
import json, numpy as np, matplotlib.pylab as plt, matplotlib.ticker as mtick
from datetime import datetime as dt_obj
import urllib
from astropy.io import fits
from sunpy.visualization.colormaps import color_tables as ct
from matplotlib.dates import *
import matplotlib.image as mpimg
import sunpy.map
import sunpy.io
from IPython.display import Image
%matplotlib inline
%config InlineBackend.figure_format='retina'
The first step in querying for SDO HMI and AIA data is to establish a connection to JSOC. This can be done with the Client()
class.
import drms
c = drms.Client()
The Client()
class allows one to access both metadata and image data simultaneously via a data series. A data series contains all of particular type of data — e.g. there is a series for continuum intensity data, another for magnetic field data, and so forth. Read Section 4 of the SDO Analysis Guide for more information about how to build a data series query. For example, to find all the SHARP data series, execute the following regular expression query:
c.series(r'hmi\.sharp_')
['hmi.sharp_720s', 'hmi.sharp_720s_nrt', 'hmi.sharp_cea_720s', 'hmi.sharp_cea_720s_nrt']
# Set a series
si = c.info('hmi.sharp_cea_720s')
To find more information about the FITS header keywords that belong to any given series, we can use the following command:
si.keywords
type | recscope | defval | units | note | linkinfo | is_time | is_integer | is_real | is_numeric | |
---|---|---|---|---|---|---|---|---|---|---|
name | ||||||||||
cparms_sg000 | string | variable | compress Rice | none | None | False | False | False | False | |
magnetogram_bzero | double | variable | 0 | none | None | False | False | True | True | |
magnetogram_bscale | double | variable | 0.1 | none | None | False | False | True | True | |
cparms_sg001 | string | variable | none | None | False | False | False | False | ||
bitmap_bzero | double | variable | 0 | none | None | False | False | True | True | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
ERRMSHA | float | variable | nan | Degrees | Error in Mean shear angle for B_total | None | False | False | True | True |
ERRUSI | float | variable | nan | Amperes | Error in Total unsigned vertical current | None | False | False | True | True |
DOFFSET | int | variable | -2147483648 | Gauss | Constant value added to the noise mask for dis... | None | False | True | False | True |
ERRTPOT | float | variable | nan | Ergs per cubic centimeter | Error in Total photospheric magnetic energy de... | None | False | False | True | True |
ERRJHT | float | variable | nan | Amperes | Sum of the Absolute Value of the Net Currents ... | None | False | False | True | True |
211 rows × 10 columns
To find more information about the FITS image data, or segments, that belong to any given series, we can use the following command:
# To see all the segments associated with the series hmi.sharp_cea_720s:
si.segments
type | units | protocol | dims | note | |
---|---|---|---|---|---|
name | |||||
magnetogram | int | Gauss | fits | VARxVAR | Line-of-sight magnetogram in CEA projection |
bitmap | char | Enumerated | fits | VARxVAR | Mask for the patch in CEA coordinates |
Dopplergram | int | m/s | fits | VARxVAR | Dopplergram in CEA projection |
continuum | int | DN/s | fits | VARxVAR | Intensitygram in CEA projection |
Bp | int | Gauss | fits | VARxVAR | B_phi, positive westward |
Bt | int | Gauss | fits | VARxVAR | B_theta, positive southward |
Br | int | Gauss | fits | VARxVAR | B_r, positive up |
Bp_err | int | Gauss | fits | VARxVAR | Standard deviation of B_phi |
Bt_err | int | Gauss | fits | VARxVAR | Standard deviation of B_theta |
Br_err | int | Gauss | fits | VARxVAR | Standard deviation of B_r |
conf_disambig | char | none | fits | VARxVAR | confidence of disambiguation result |
The query below retrieves both metadata and image data for active region 11158, which produced an X2.2-class flare on February 15, 2011 at 1:56 UT, from the SHARP data series. The SHARP data series include patches of vector magnetic field data taken by the HMI instrument. These patches encapsulate automatically-detected active regions that are tracked throughout the entirety of their disk passage. The c.query()
method takes three arguments:
hmi.sharp_cea_720s
. This series is appended with two prime keys: the HARP number (in this case, 377) and the time range (in this case, 2011.02.14_15:00:00/12h). These two prime keys appear in the first two brackets. The HARP number refers to the active region number (see here for a mapping between HARP numbers and NOAA active region numbers). A prime key, or set of prime keys, is a unique identifier. The third bracket, with the argument [? (QUALITY<65536) ?]
, filters out data where the value of the QUALITY
keyword is greater than 65536. (See here for the definition of the QUALITY
keyword). While this third bracket is not necessary, it can be a powerful tool for filtering data based on keyword values.T_REC
, USFLUX
, and ERRVF
.Br
, or the radial component of the photospheric magnetic field.keys, segments = c.query('hmi.sharp_cea_720s[377][2011.02.14_15:00:00/12h][? (QUALITY<65536) ?]', key='T_REC, USFLUX, ERRVF', seg='Br')
To convert the keyword T_REC
into a datetime object, we can use the function below.
def parse_tai_string(tstr,datetime=True):
year = int(tstr[:4])
month = int(tstr[5:7])
day = int(tstr[8:10])
hour = int(tstr[11:13])
minute = int(tstr[14:16])
if datetime: return dt_obj(year,month,day,hour,minute)
else: return year,month,day,hour,minute
t_rec = np.array([parse_tai_string(keys.T_REC[i],datetime=True) for i in range(keys.T_REC.size)])
Now for some plotting! matplotlib.pyplot generates two objects: a figure and axes. The data are ascribed to the axes. The time axes in particular requires some formatting; in order to free it of clutter, we'll plot tick marks every three hours and label them accordingly.
fig, ax = plt.subplots(figsize=(8,7)) # define the size of the figure
orangered = (1.0,0.27,0,1.0) # create an orange-red color
# define some style elements
marker_style = dict(linestyle='', markersize=8, fillstyle='full',color=orangered,markeredgecolor=orangered)
text_style = dict(fontsize=16, fontdict={'family': 'monospace'})
ax.tick_params(labelsize=14)
ax.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.2f'))
# ascribe the data to the axes
ax.plot(t_rec, (keys.USFLUX)/(1e22),'o',**marker_style)
ax.errorbar(t_rec, (keys.USFLUX)/(1e22), yerr=(keys.ERRVF)/(1e22), linestyle='',color=orangered)
# format the x-axis with universal time
locator = AutoDateLocator()
locator.intervald[HOURLY] = [3] # only show every 3 hours
formatter = DateFormatter('%H')
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
# set yrange
ax.set_ylim([2.4,2.8])
# label the axes and the plot
ax.set_xlabel('time in UT',**text_style)
ax.set_ylabel('maxwells x 1e22',**text_style)
ax.set_title('total unsigned flux starting at '+str(t_rec[0])+' UT',**text_style) # annotate the plot with a start time
Text(0.5, 1.0, 'total unsigned flux starting at 2011-02-14 15:00:00 UT')
The example above shows a simple query for 12 hours of data from one HARPNUM. But we can also perform more complex queries to identify active regions of interest. Here are a few examples.
Example 1. Suppose we want to create a magnetic field model of an active region and we need observations of a strong-field active region near disk center. This query identifies strong-field regions near disk center during a two year period. We define strong active regions as those with a total unsigned flux (USFLUX
) greater than $4x10^{22}$ Mx and near disk center as those with a Carrington Longitude (CRLN_OBS
) less than $1^{\circ}$. The two year period spans between January 2014 and January 2016.
keys = c.query('hmi.sharp_cea_720s[][2014.01.01 - 2016.01.01][? (CRLN_OBS < 1) AND (USFLUX > 4e22) ?]', key='T_REC, HARPNUM, USFLUX, CRLT_OBS, CRLN_OBS, AREA_ACR')
keys
T_REC | HARPNUM | USFLUX | CRLT_OBS | CRLN_OBS | AREA_ACR | |
---|---|---|---|---|---|---|
0 | 2014.05.04_14:48:00_TAI | 4071 | 4.007906e+22 | -3.818258 | 0.045204 | 1297.351685 |
1 | 2014.11.11_06:00:00_TAI | 4800 | 4.004017e+22 | 3.297165 | 0.676486 | 1207.938232 |
2 | 2014.11.11_06:12:00_TAI | 4800 | 4.023450e+22 | 3.295724 | 0.567313 | 1194.579590 |
3 | 2014.11.11_06:24:00_TAI | 4800 | 4.050183e+22 | 3.294308 | 0.458156 | 1198.369873 |
4 | 2014.11.11_06:36:00_TAI | 4800 | 4.070347e+22 | 3.292919 | 0.349014 | 1204.132935 |
5 | 2014.11.11_06:48:00_TAI | 4800 | 4.081056e+22 | 3.291558 | 0.239885 | 1208.038818 |
6 | 2014.11.11_07:00:00_TAI | 4800 | 4.094752e+22 | 3.290226 | 0.130767 | 1199.290039 |
7 | 2014.11.11_07:12:00_TAI | 4800 | 4.113331e+22 | 3.288925 | 0.021658 | 1222.337891 |
8 | 2015.02.01_05:00:00_TAI | 5127 | 4.801844e+22 | -6.020030 | 0.985052 | 2273.071289 |
9 | 2015.02.01_05:12:00_TAI | 5127 | 4.817782e+22 | -6.020372 | 0.875986 | 2271.228516 |
10 | 2015.02.01_05:24:00_TAI | 5127 | 4.778004e+22 | -6.020685 | 0.766931 | 2270.977295 |
11 | 2015.02.01_05:36:00_TAI | 5127 | 4.801163e+22 | -6.020968 | 0.657885 | 2278.230469 |
12 | 2015.02.01_05:48:00_TAI | 5127 | 4.780960e+22 | -6.021223 | 0.548846 | 2281.161621 |
13 | 2015.02.01_06:00:00_TAI | 5127 | 4.724961e+22 | -6.021451 | 0.439813 | 2275.408203 |
14 | 2015.02.01_06:12:00_TAI | 5127 | 4.682469e+22 | -6.021652 | 0.330784 | 2282.527344 |
15 | 2015.02.01_06:24:00_TAI | 5127 | 4.597347e+22 | -6.021828 | 0.221756 | 2280.517334 |
16 | 2015.02.01_06:36:00_TAI | 5127 | 4.505237e+22 | -6.021980 | 0.112728 | 2281.434570 |
17 | 2015.02.01_06:48:00_TAI | 5127 | 4.448969e+22 | -6.022110 | 0.003698 | 2298.415039 |
18 | 2015.06.17_13:48:00_TAI | 5692 | 4.593700e+22 | 1.266837 | 0.923733 | 1495.361328 |
19 | 2015.06.17_14:00:00_TAI | 5692 | 4.592959e+22 | 1.267220 | 0.813259 | 1503.894775 |
20 | 2015.06.17_14:12:00_TAI | 5692 | 4.593827e+22 | 1.267613 | 0.702742 | 1489.501465 |
21 | 2015.06.17_14:24:00_TAI | 5692 | 4.584389e+22 | 1.268017 | 0.592184 | 1494.039429 |
22 | 2015.06.17_14:36:00_TAI | 5692 | 4.636860e+22 | 1.268434 | 0.481585 | 1491.178833 |
23 | 2015.06.17_14:48:00_TAI | 5692 | 4.616645e+22 | 1.268866 | 0.370945 | 1495.119507 |
24 | 2015.06.17_15:00:00_TAI | 5692 | 4.629078e+22 | 1.269314 | 0.260266 | 1505.565796 |
25 | 2015.06.17_15:12:00_TAI | 5692 | 4.652655e+22 | 1.269780 | 0.149549 | 1496.576538 |
26 | 2015.06.17_15:24:00_TAI | 5692 | 4.681429e+22 | 1.270264 | 0.038794 | 1479.642456 |
27 | 2015.10.31_19:12:00_TAI | 6063 | 5.331063e+22 | 4.459106 | 0.909237 | 1804.466309 |
28 | 2015.10.31_19:24:00_TAI | 6063 | 5.314220e+22 | 4.458659 | 0.798615 | 1805.412476 |
29 | 2015.10.31_19:36:00_TAI | 6063 | 5.258935e+22 | 4.458181 | 0.687989 | 1811.795410 |
30 | 2015.10.31_19:48:00_TAI | 6063 | 5.197402e+22 | 4.457672 | 0.577359 | 1807.541748 |
31 | 2015.10.31_20:00:00_TAI | 6063 | 5.231150e+22 | 4.457130 | 0.466728 | 1802.004395 |
32 | 2015.10.31_20:12:00_TAI | 6063 | 5.238996e+22 | 4.456556 | 0.356098 | 1813.600952 |
33 | 2015.10.31_20:24:00_TAI | 6063 | 5.218206e+22 | 4.455947 | 0.245472 | 1822.455444 |
34 | 2015.10.31_20:36:00_TAI | 6063 | 5.241477e+22 | 4.455305 | 0.134850 | 1833.126953 |
35 | 2015.10.31_20:48:00_TAI | 6063 | 5.301954e+22 | 4.454628 | 0.024236 | 1834.722900 |
Example 2. Suppose we are doing a study on flux emergence and we want to identify active regions that live for a long time. This query identifies long-lived active regions within a six year period. We define long-lived active regions as those with a minimum number of observations (N_PATCH1
) equal to 1800 (1800 observations with a 12 minute gap between observations means a minimum observation time of 15 days). The six year period spans between January 2012 and January 2018. The T_FRST1=T_REC
clause identifies the first observation time in the sequence.
keys = c.query('hmi.sharp_cea_720s[][2012.01.01 - 2018.01.01][? (N_PATCH1 > 1800) AND (T_FRST1=T_REC) ?]', key='T_REC, HARPNUM, NOAA_ARS, N_PATCH1, AREA_ACR')
keys
T_REC | HARPNUM | NOAA_ARS | N_PATCH1 | AREA_ACR | |
---|---|---|---|---|---|
0 | 2012.07.04_03:24:00_TAI | 1834 | 11519,11520,11521 | 1820 | 28.082201 |
1 | 2012.07.22_21:12:00_TAI | 1879 | 11529,11530,11532,11533,11536 | 1883 | 24.621628 |
2 | 2012.07.30_22:24:00_TAI | 1907 | 11538,11539,11540,11541,11544,11545 | 1809 | 18.218309 |
3 | 2012.09.17_23:24:00_TAI | 2040 | 11575,11577,11583 | 1986 | 124.659348 |
4 | 2013.04.26_15:36:00_TAI | 2696 | 11732,11734 | 1846 | 18.187544 |
5 | 2013.06.12_17:24:00_TAI | 2852 | 11769,11770,11771,11772,11774,11775 | 1864 | 3.143462 |
6 | 2013.12.30_21:00:00_TAI | 3563 | 11943,11944 | 1822 | 2.208999 |
7 | 2014.01.17_09:00:00_TAI | 3647 | 11958,11959,11960,11963,11964 | 1813 | 18.883802 |
8 | 2014.02.21_04:36:00_TAI | 3784 | 11987,11989,11993,11994,12001 | 2018 | 30.495930 |
9 | 2014.03.14_21:24:00_TAI | 3856 | 12008,12010,12012,12015,12019,12023 | 1838 | 21.218081 |
10 | 2014.04.10_23:00:00_TAI | 4000 | 12035,12038,12043,12046 | 1878 | 14.411080 |
11 | 2014.07.27_01:00:00_TAI | 4396 | 12127,12128,12130,12131,12132 | 1927 | 18.025908 |
12 | 2014.12.09_05:12:00_TAI | 4920 | 12235,12237,12238,12242 | 1841 | 21.272736 |
Example 3. Suppose we are doing a study on flare prediction. Schrijver (2007) derived a parameter, called $R$, which quantifies the flux near an active region neutral line. The study concluded that an active region will flare if the log of R is greater than 5. Bobra & Couvidat (2015) also identified a large total unsigned helicity as a relevant characteristic of flaring active regions. This query identifies active regions with a log of R (R_VALUE
) greater than 5.5 or a total unsigned helicity (TOTUSJH
) greater than 8900 $\frac{G^{2}}{m}$ during a two year period between January 2012 and January 2014.
keys = c.query('hmi.sharp_cea_720s[][2012.01.01 - 2014.01.01][? (R_VALUE > 5.5 AND R_VALUE < 6.0) OR (TOTUSJH >= 8900)?]', key='T_REC,HARPNUM,R_VALUE,TOTUSJH')
keys
T_REC | HARPNUM | R_VALUE | TOTUSJH | |
---|---|---|---|---|
0 | 2012.07.09_15:12:00_TAI | 1834 | 5.249 | 8920.152 |
1 | 2012.07.09_15:36:00_TAI | 1834 | 5.239 | 8954.800 |
2 | 2012.07.09_15:48:00_TAI | 1834 | 5.253 | 8923.751 |
3 | 2012.07.09_16:00:00_TAI | 1834 | 5.252 | 8902.096 |
4 | 2012.07.09_16:12:00_TAI | 1834 | 5.251 | 8918.384 |
5 | 2012.07.09_16:24:00_TAI | 1834 | 5.256 | 8901.315 |
6 | 2012.07.09_16:36:00_TAI | 1834 | 5.261 | 8928.521 |
7 | 2012.07.09_21:12:00_TAI | 1834 | 5.264 | 8907.350 |
8 | 2012.07.09_21:48:00_TAI | 1834 | 5.271 | 8910.375 |
9 | 2012.07.09_22:12:00_TAI | 1834 | 5.272 | 8931.835 |
10 | 2012.07.10_05:36:00_TAI | 1834 | 5.255 | 8907.319 |
11 | 2012.07.10_05:48:00_TAI | 1834 | 5.285 | 8903.378 |
12 | 2013.11.25_10:00:00_TAI | 3376 | 5.530 | 55.039 |
We can also query for and plot image data. There are two ways to do this.
We can download the image data, as unmerged FITS file, and header data separately. An unmerged FITS file contains the image data, but almost no header metadata (except for a few keywords). This is the quickest and easiest option as the drms.Client()
class can query the header and image data at the same time and store the keyword metadata and URLs to the image data in a Pandas dataframe. This eliminates the need to store FITS files locally. This method is also faster, as there is no need to wait for the exportdata system to generate FITS files. We can then download and open the unmerged FITS files via the astropy
package for FITS file handling.
We can download the merged FITS file, which merges the header metadata and the image data together, and use SunPy Map to plot the image data. This is the easiest way to put the image data into a coordinate system, as the SunPy Map object will automatically use the WCS keyword data to plot the image data in the correct coordinate system. We can also read merged FITS via the astropy
package for FITS file handling.
We go through each option below using an image of the radial component of the photospheric magnetic field as an example.
Query the image data and header metadata using drms
, then download and open the unmerged FITS file with astropy
:
hmi_query_string = 'hmi.sharp_cea_720s[377][2011.02.15_02:12:00_TAI]'
keys, segments = c.query(hmi_query_string, key='T_REC, USFLUX, ERRVF', seg='Br')
url = 'http://jsoc.stanford.edu' + segments.Br[0] # add the jsoc.stanford.edu suffix to the segment name
photosphere_image = fits.open(url) # download and open the unmerged FITS file via astropy
Plot the image data with matplotlib
:
hmimag = plt.get_cmap('hmimag')
plt.imshow(photosphere_image[1].data,cmap=hmimag,origin='lower',vmin=-3000,vmax=3000)
print('The dimensions of this image are',photosphere_image[1].data.shape[0],'by',photosphere_image[1].data.shape[1],'.')
The dimensions of this image are 377 by 744 .
There are only a few keywords associated with the unmerged FITS file:
photosphere_image[1].header
SIMPLE = T / file does conform to FITS standard BITPIX = -64 / data type of original image NAXIS = 2 / dimension of original image NAXIS1 = 744 / length of original image axis NAXIS2 = 377 / length of original image axis PCOUNT = 0 / size of special data area GCOUNT = 1 / one data group (required keyword) XTENSION= 'BINTABLE' / binary table extension BLANK = -2147483648 CHECKSUM= 'VCJiX9GfVAGfV9Gf' / HDU checksum updated 2018-05-10T01:45:55 DATASUM = '1982616782' / data unit checksum updated 2018-05-10T01:45:55
But we can get all the header metadata information we like from the drms
query:
keys
T_REC | USFLUX | ERRVF | |
---|---|---|---|
0 | 2011.02.15_02:12:00_TAI | 2.653720e+22 | 6.506040e+18 |
First we will download the FITS file from JSOC.
n.b. The code below will only work with a valid e-mail address. In order to obtain one, users must register on the JSOC exportdata website.
email = 'your@email.address'
c = drms.Client(email=email, verbose=True)
# Export the magnetogram as a FITS image
r = c.export(hmi_query_string+'{Br}', protocol='fits', email=email)
fits_url_hmi = r.urls['url'][0]
hmi_map = sunpy.map.Map(fits_url_hmi)
Export request pending. [id=JSOC_20210406_1196, status=2] Waiting for 5 seconds...
fig = plt.figure()
hmi_map.plot(cmap=hmimag, vmin=-3000,vmax=3000)
<matplotlib.image.AxesImage at 0x15a6f5f40>
The image is now in the correct coordinate system! We can also inspect the map like this:
hmi_map
<sunpy.map.sources.sdo.HMIMap object at 0x157cecb20>
|
Image colormap uses histogram equalization
Click on the image to toggle between units |
||||||||||||||||||||||||
And look at the header metadata like this:
hmi_map.fits_header['T_REC']
'2011.02.15_02:12:00.000_TAI'
If we only want to read the FITS file, we can use via the astropy
package for FITS file handling.
photosphere_image = fits.open(fits_url_hmi)
All of the header information is in photosphere_image[1].header
as an Astropy HDUList object. The image data is in photosphere_image[1].data
as a numpy array.
photosphere_image[1].header['T_REC']
'2011.02.15_02:12:00.000_TAI'
JSOC allows users to export data as FITS files, jpg images and movies in mp4 or mpg format.
We can easily export any image data as a jpg using one of several color tables (see the expordata website under the jpg protocol for a list of color tables). Here is an example with NOAA Active Region 11158 at the time of the X2.2-class flare.
protocol_args = {'ct': 'HMI_mag.lut', 'min': -3000, 'max': 3000}
r = c.export('hmi.sharp_cea_720s[377][2011.02.14_02:00:00]{Br}', protocol='jpg', protocol_args=protocol_args, email=email)
print(r.request_url)
Export request pending. [id=JSOC_20210406_1210, status=2] Waiting for 5 seconds... Export request pending. [id=JSOC_20210406_1210, status=1] Waiting for 5 seconds... http://jsoc.stanford.edu/SUM2/D1388071382/S00000
image_url = r.urls['url'][0]
print('The image is at this url: ',image_url,'.')
The image is at this url: http://jsoc.stanford.edu/SUM2/D1388071382/S00000/20110214_020000_hmi.sharp_cea_720s_744.jpg .
We can also easily create a movie using any image data! The movie below tracks NOAA Active Region 11158 over the same time interval as the plot above. In this case, the segment information, Br, is notated in the curly brackets within the data series query.
r = c.export('hmi.sharp_cea_720s[377][2011.02.14_15:00:00/12h]{Br}', protocol='mpg', protocol_args=protocol_args, email=email)
print(r.request_url)
Export request pending. [id=JSOC_20210406_1211, status=2] Waiting for 5 seconds... Export request pending. [id=JSOC_20210406_1211, status=1] Waiting for 5 seconds... http://jsoc.stanford.edu/SUM2/D1388071402/S00000
print("You can download the movie from here:", r.urls['url'][0])
You can download the movie from here: http://jsoc.stanford.edu/SUM2/D1388071402/S00000/hmi.sharp_cea_720s.mpg