Lecturer: Dan Perley
Jupyter Notebook Authors: Dan Perley, Kishalay De & Cameron Hummels
This is a Jupyter notebook lesson taken from the GROWTH Summer School 2019. For other lessons and their accompanying lectures, please see: http://growth.caltech.edu/growth-school-2019.html
Process raw images from a visible wavelength telescope and make them ready for photometric analysis
See GROWTH school webpage for detailed instructions on how to install these modules and packages. Nominally, you should be able to install the python modules with pip install <module>
(or pip3 install <module>
). The external astromatic packages are easiest installed using package managers (e.g., rpm
, apt-get
).
## Import necessary modules
from astropy.io import fits
from astropy.wcs import WCS
from astropy.stats import sigma_clipped_stats
import glob
import os
import subprocess
import warnings
import numpy as np
import matplotlib.pyplot as plt
import photutils
import pyregion
# You can ignore any warnings that appear below, but if any modules can't be imported you need to install them
WARNING: AstropyDeprecationWarning: Composition of model classes will be removed in 4.0 (but composition of model instances is not affected) [astropy.modeling.core]
In order to complete the module some external software is required. The following step checks that these are installed properly.
def test_dependency(dep, alternate_name=None):
"""
Test external dependency by trying to run it as a subprocess
"""
try:
subprocess.check_output(dep, stderr=subprocess.PIPE, shell=True)
print("%s is installed properly as %s. OK" % (dep, dep))
return 1
except subprocess.CalledProcessError:
try:
subprocess.check_output(alternate_name, stderr=subprocess.PIPE, shell=True)
print("%s is installed properly as %s. OK" % (dep, alternate_name))
return 1
except subprocess.CalledProcessError:
print("===%s/%s IS NOT YET INSTALLED PROPERLY===" % (dep, alternate_name))
return 0
dependencies = [('sextractor', 'sex'), ('SWarp', 'swarp')]
i = 0
for dep_name1, dep_name2 in dependencies:
i += test_dependency(dep_name1, dep_name2)
print("%i out of %i external dependencies installed properly.\n" % (i, len(dependencies)))
if i != len(dependencies):
print("Please correctly install these programs before continuing by following the instructions in README.md.")
else:
print("You are ready to continue.")
sextractor is installed properly as sextractor. OK SWarp is installed properly as SWarp. OK 2 out of 2 external dependencies installed properly. You are ready to continue.
(SWarp is optional, but SExtractor is needed for astrometric alignment. If you are unable to install it, it is possible to follow most of the module without it.)
The various directories where data live are defined here for use later in this notebook.
## Define data directories
curpath = os.path.abspath('.') # top level directory
dataFolder = os.path.join(curpath, 'data') # data directory
biasFolder = os.path.join(dataFolder, 'bias') # bias frames subdirectory
flatFolder = os.path.join(dataFolder, 'flat') # flat fields subdirectory
sciFolder = os.path.join(dataFolder, 'science') # science data subdirectory
procFolder = os.path.join(curpath, 'processing') # processing directory
if not os.path.isdir(procFolder):
os.mkdir(procFolder)
else:
for f in os.listdir(procFolder):
try:
os.remove(os.path.join(procFolder,f)) # clear the processing folder from previous iterations
except:
print('Could not remove',f)
Next we get lists of all the various file types, including lists and names of files we'll be writing out later.
## Get file lists
os.chdir(sciFolder)
fileList = sorted(glob.glob('*.fits'))
os.chdir(curpath)
biasList = sorted(glob.glob(os.path.join(biasFolder,'*.fits')))
flatList = sorted(glob.glob(os.path.join(flatFolder,'*.fits')))
sciList = [os.path.join(sciFolder, file) for file in fileList]
procList = [os.path.join(procFolder, file).replace('.fits','.proc.fits') for file in fileList]
combinedFile = os.path.join(procFolder, 'AT2018cow_combined.fits')
resampledFile = os.path.join(procFolder, 'AT2018cow_resampled.fits')
print('Found',len(biasList),'bias files; ',len(flatList),'flat files; ',len(sciList),'science files')
Found 5 bias files; 5 flat files; 3 science files
Like nearly all astronomical images, our data are stored in the form of FITS files. FITS files contain a plaintext header storing the observational metadata (e.g. the name of the telescope) plus an array of pixel data. FITS files can may be 'simple' (single header, single pixel image) or may contain extensions (additional headers and/or pixel images).
FITS files are loaded and examined using the astropy library in python. Below we open up a FITS file and take a look at its structure.
## Open an example FITS file
exampleFile = sciList[0] # The first science image
HDUList = fits.open(exampleFile) # Open the file
HDUList.info() # Print some information about the FITS file structure.
Filename: /home/growth/Downloads/working_copies/image_reduction/data/science/acam761.fits No. Name Ver Type Cards Dimensions Format 0 PRIMARY 1 PrimaryHDU 139 () 1 extension1 1 ImageHDU 49 (2148, 2501) int16 (rescales to uint16)
This reveals that the FITS header metadata is in extension 0 (the 'primary' header) and the pixel data is in extension 1. The image dimensions are 2148x2501 pixels.
A good FITS header will contain almost everything we need to know about the observation. Header data is stored as keyword + value pairs, optionally with a comment following the value. Keywords are all-capitals and 8 characters or less; corresponding values can be much longer. There is almost no standardization in the use of keywords at different telescopes, but OBJECT and EXPTIME are commonly used to store the name of the object being observe and the integration time in seconds, respectively.
Below we will load the header and print out a limited set of the header lines in "raw" format (showing how the header metadata is stored in the file), and also look up a few specific bits of information using their keywords.
## Print some header properties
header = HDUList[0].header # Get the header out of the HDU list.
print(repr(header[0:13])) # Print the first 14 lines of the header (in raw format).
print(repr(header[24:34])) # Print some additional lines
print(repr(header[48:52]))
print(repr(header[67:71]))
print(repr(header[129:135]))
print()
print('The object is:', header['OBJECT']) # Print the name of the object being observed
print('The filters are:', header['ACAMFILT']) # Print the names of the optical filters in the light path.
print('The exposure time is:', header['EXPTIME']) # Print the exposure time
SIMPLE = T / BITPIX = 8 / NAXIS = 0 / EXTEND = T / File contains extensions RUN = 2696761 / Run number IRAFNAME= 'r2696761 ' / IRAF should use this name RUNSET = '1:1:2696761 ' / i:n:r => Run i of n runs starting at #r SYSVER = 'S15-281 ' / Version of observing system ORIGIN = 'ING La Palma ' / Name of observatory OBSERVAT= 'LAPALMA ' / Name of observatory (IRAF style) OBSERVER= 'Dan Perley ' / Name of principal investigator PROPOSAL= 'P12 ' / Code for grant of observing time OBJECT = 'AT2018cow 0.0 0.0 ' / Title of observation INSTRUME= 'ACAM ' / Name of instrument configuration ACAMMODE= 'IMAGING ' / ACAM mode : imaging or spectroscopy ACAMWH1 = 'SlnG ' / Name of filter deployed in wheel 1 ACAMWH2 = 'CLEAR ' / Name of filter deployed in wheel 2 ACAMFSLI= 'CLEAR ' / Name of filter deployed in slit mechanism ACAMSLIT= 'CLEAR ' / Element deployed in slit mechanism ACAMMASK= 'CLR ' / Mask deployed in slit mechanism ACAMDISP= 'NONE ' / Name of disperser deployed in wheel 2 ACAMSLI = 'CLR ' / CLR,PIN,Slit width(if MASK deployed) or 4 chars ACAMFILT= 'SlnG+CLEAR ' / Filters in wh1/wh2 TELESCOP= 'WHT ' / 4.2m William Herschel Telescope LATITUDE= 28.760457 / Telescope latitude (deg) LONGITUD= -17.881585 / Telescope longitude (deg) HEIGHT = 2344.000000 / Height above sea level (m) RA = '16:16:00.225 ' / RA being tracked at UTOBS (hh:mm:ss.sss) DEC = '+22:16:04.86 ' / Dec being tracked at UTOBS (sdd:mm:ss.ss) RADECSYS= 'FK5 ' / Mean place new (after the 1976 IAU) system EQUINOX = 2000.000000 / Equinox of coordinates (nnnn.n) DATE-OBS= '2018-07-14 ' / Date at start of integration UT = '23:55:12.971 ' / UT at start of integration UTSTART = '23:55:12.971 ' / UT at start of integration UT-MID = '23:55:57.976 ' / UT at the middle of the exposure EXPOSED = 90.01000000 / [s] Exposure time EXPTIME = 90.01000000 / [s] Exposure time The object is: AT2018cow 0.0 0.0 The filters are: SlnG+CLEAR The exposure time is: 90.01
From this we can infer that this is a Sloan g-band image of AT2018cow and the exposure time was 90 seconds. We can also see that this data was taken with the ACAM instrument on the WHT last July 14th (by your module leader). This is only a select portion of the header; printing out the entire header would show other keywords relating to the detector settings and temperature and other properties of the instrument, telescope, and/or facililty.
It's usually also a good idea to visually inspect the data to get a feel for its appearance and to make sure it doesn't have any obvious problems before we analyze it. We will use the imshow command within pyplot, which requires us to provide some paramters defining the minimum and maximum range of the colorbar scale. For this, we will use sigma-clipped image pixel statistics to come up with some reasonable values.
## Get and display the image data
data = HDUList[1].data # Get the data array (a simple numpy array) from the first extension.
mean, median, std = sigma_clipped_stats(data) # get some image statistics
plt.figure(figsize=(8,8)) # set up the plot panel
plt.imshow(data, vmin = median - 2*std, vmax = median + 3*std, origin='lower')
plt.colorbar()
plt.show()
This shows us that the field of view of the instrument is circular and centered at approximately x=1000, y=1100. A significant fraction of the CCD is not exposed.
While the image already looks pretty nice at a glance, raw data of this type is not suitable for anything other than crude preliminary analysis: the sensitivity may vary across the field, it may contain instrumental signals from the detector electronics, it may be contaminated by cosmic rays, etc. Removing these effects to produce a "pure" image of the night sky is the role of data reduction. There are two generic steps (bias correction and flat-fielding), which can be followed by a variety of possible high-level calibration and artifact-correction steps. We will address these in the next section. For now, we are done inspecting the science image so we will close it.
HDUList.close() # Close the file (good practice, but not strictly necessary)
The detector image comes with an electronic offset introduced by the voltages applied to the the semiconductor-based detector. This can be clearly seen in the science image above: even the unexposed parts of the detector have an apparent signal level of about 1900 counts. We have to remove this offset to get real counts received by the detector.
Different bias-removal methods are employed by different detectors. In most cases, we record a set of images ("bias frames") with zero exposure time, so that the recorded image reflects only the intrinsic offsets in the detector (the detector does not receive any photons if the exposure time is zero). To reduce detector noise and to remove cosmic ray contamination, we usually take many bias frames and average them together using median-combination. (Even though the exposure time is zero, cosmic rays can strike the detector during readout.)
As always, it's a good idea to inspect your images before you process them. As an exercise below, open the first bias image and verify that it doesn't contain any obvious astronomical signals.
## EXERCISE 1: Display the first bias image and determine its clipped standard deviation.
HDUList = fits.open(biasList[0])
data = HDUList[1].data # Get the data array (which is a simple numpy array).
HDUList.close() # File can be closed as soon as the data is copied out of it.
mean, median, std = sigma_clipped_stats(data) # get some image statistics
plt.figure(figsize=(8,8)) # set up the plot panel
plt.imshow(data, vmin = median - 2*std, vmax = median + 3*std, origin='lower')
plt.colorbar()
plt.show()
print('Standard deviation:', std)