Owners: Justin Myles (@jtmyles), Phil Marshall (@drphilmarshall)
Last Verified to Run: 2018-09-21
Verified Stack Release: 16.0
This project addresses issue #63: HSC Re-run
This notebook demonstrates the pipeline described in the LSST Science Piplines data processing tutorial, from ingesting images (using the obs_subaru package) through image processing, coaddition, source detection and object measurement all the way through to measuring forced photometry light curves in a small patch of the HSC sky (in the ci_hsc repository). It does this by calling a bash
script, having first identified a minimal data set for demonstration purposes.
After working through and studying this notebook you should be able to understand how to use the DRP pipeline from image visualization through to a forced photometry light curve. Specific learning objectives include:
This notebook is intended to be runnable on lsst-lsp-stable.ncsa.illinois.edu
from a local git clone of https://github.com/LSSTScienceCollaborations/StackClub.
import os
import re
import sys
import glob
import numpy as np
import pandas as pd
import astropy.io.fits as fitsio
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter, FormatStrFormatter
%matplotlib inline
import eups.setupcmd
import lsst.daf.persistence as dafPersist
HOME = os.environ['HOME']
DATAREPO = "{}/repositories/ci_hsc/".format(HOME)
DATADIR = "{}/DATA/".format(HOME)
CI_HSC = "/project/shared/data/ci_hsc/"
os.system("mkdir -p {}".format(DATADIR));
The pipeline described in the LSST Science Piplines data processing tutorial contains a complete set of command line tasks that can be assembled into an end-to-end data reduction pipeline script. Let's see what this script looks like.
! cat Re-RunHSC.sh
We'll come back to each step in turn throughout the rest of this notebook.
https://pipelines.lsst.io/getting-started/data-setup.html
Part I runs the following command-line tasks
source /opt/lsst/software/stack/loadLSST.bash
eups list lsst_distrib
setup lsst_distrib
setup -j -r /project/shared/data/ci_hsc
echo lsst.obs.hsc.HscMapper > $DATADIR/_mapper
ingestImages.py $DATADIR $CI_HSC_DIR/raw/*.fits --mode=link
installTransmissionCurves.py $DATADIR
ln -s $CI_HSC_DIR/CALIB/ $DATADIR/CALIB
mkdir -p $DATADIR/ref_cats
ln -s $CI_HSC_DIR/ps1_pv3_3pi_20170110 $DATADIR/ref_cats/ps1_pv3_3pi_20170110
In summary, these lines establish a directory with read/write permission where all data products made by this tutorial will be stored. In the underlying Python, this repository is represented by an instance of the Butler class (which inherits directly from Object).
The first thing we do in our Butler repository is write a _mapper
file, which tells the Butler which instrument was used to collect the data stored in the Butler. It needs this file to find and organize data in a format specific to the appropriate camera. This illustrates the Butler's motivating concept: the LSST DM Stack should be capable of interacting with data collected from a variety of instruments. The Butler facilitates this process by abstracting the data read/write process.
Second, we use the ingestImages.py
task to organize the data in a format specific to HSC. This task reads the raw FITS files and stores the schema associated with the data in the Butler repository.
Third, we use the installTransmissionCurves.py
task to install transmission curves for the data and link the calibration files associated with the raw data to the Butler.
Last, we link a reference catalog to the Butler for use with astrometry.
Doing some these steps in Python might look something like this:
"""
#!setup -j -r /home/jmyles/repositories/ci_hsc
setup = eups.setupcmd.EupsSetup(["-j","-r", DATAREPO])
status = setup.run()
print('setup exited with status {}'.format(status))
with open(DATADIR + "_mapper", "w") as f:
f.write("lsst.obs.hsc.HscMapper")
#!installTransmissionCurves.py /home/jmyles/DATA
from lsst.obs.hsc import makeTransmissionCurves, HscMapper
from lsst.daf.persistence import Butler
butler = Butler(outputs={'root': datadir, 'mode': 'rw', 'mapper': HscMapper})
for start, nested in makeTransmissionCurves.getFilterTransmission().items():
for name, curve in nested.items():
if curve is not None:
butler.put(curve, "transmission_filter", filter=name)
for start, nested in makeTransmissionCurves.getSensorTransmission().items():
for ccd, curve in nested.items():
if curve is not None:
butler.put(curve, "transmission_sensor", ccd=ccd)
for start, curve in makeTransmissionCurves.getOpticsTransmission().items():
if curve is not None:
butler.put(curve, "transmission_optics")
for start, curve in makeTransmissionCurves.getAtmosphereTransmission().items():
if curve is not None:
butler.put(curve, "transmission_atmosphere")
# ingest calibration images into Butler repo
os.system("ln -s {} {}".format(datarepo + "CALIB/", datadir + "CALIB"))
# ingest reference catalog into Butler repo
os.system("mkdir -p {}".format(DATADIR + "ref_cats"))
os.system("ln -s {}ps1_pv3_3pi_20170110 {}ref_cats/ps1_pv3_3pi_20170110".format(DATAREPO, DATADIR))
"""
https://pipelines.lsst.io/getting-started/processccd.html
Part II runs the following command-line task:
processCcd.py $DATADIR --rerun processCcdOutputs --id
This applies photometric and astrometric calibrations to the raw images.
The id flag allows you to select data by data ID: an unspecified id selects all raw data. Other example arguments are raw, filter, visit, ccd, and field.
All command-line tasks write output datasets to a Butler repository. The --rerun flag here tells the tasks to write to processCcdOutputs
.
For information on further unpacking, see Alex Drlica-Wagner's Pipeline Tasks notebook on unpacking command-line tasks.
from stackclub import where_is
from lsst.pipe.tasks.processCcd import ProcessCcdTask, ProcessCcdConfig
processCcdConfig = ProcessCcdConfig()
processCcdTaskInstance = ProcessCcdTask(butler=butler)
where_is(processCcdTaskInstance, in_the="source")
ProcessCcdTask.parseAndRun()
"""
# Running this in python might look something like this:
from stackclub import where_is
processCcd.py
from lsst.pipe.tasks.processCcd import ProcessCcdTask
processCcdConfig = ProcessCcdConfig()
processCcdTaskInstance = ProcessCcdTask(butler=butler)
where_is(processCcdTaskInstance, in_the="source")
ProcessCcdTask.parseAndRun()
"""
https://pipelines.lsst.io/getting-started/display.html
This part of the tutorial is omitted.
Part IV runs the following command-line tasks:
makeDiscreteSkyMap.py $DATADIR --id --rerun processCcdOutputs:coadd --config skyMap.projection="TAN"
makeCoaddTempExp.py $DATADIR --rerun coadd \
--selectId filter=HSC-R \
--id filter=HSC-R tract=0 patch=0,0^0,1^0,2^1,0^1,1^1,2^2,0^2,1^2,2 \
--config doApplyUberCal=False doApplySkyCorr=False
makeCoaddTempExp.py $DATADIR --rerun coadd \
--selectId filter=HSC-I \
--id filter=HSC-I tract=0 patch=0,0^0,1^0,2^1,0^1,1^1,2^2,0^2,1^2,2 \
--config doApplyUberCal=False doApplySkyCorr=False
assembleCoadd.py $DATADIR --rerun coadd \
--selectId filter=HSC-R \
--id filter=HSC-R tract=0 patch=0,0^0,1^0,2^1,0^1,1^1,2^2,0^2,1^2,2
assembleCoadd.py $DATADIR --rerun coadd \
--selectId filter=HSC-I \
--id filter=HSC-I tract=0 patch=0,0^0,1^0,2^1,0^1,1^1,2^2,0^2,1^2,2
Since we want the deepest possible image for source detection in this example, we construct coadded images of the exposures. A sky map is a tiling of the celestial sphere. It is composed of one or more tracts, where a tract is in turn composed of one or more overlapping patches of sky which share a single WCS.
First, we define a skymap with makeDiscreteSkyMap.py
so that we can warp all of the exposure to fit on a single coordinate system.
Second, we warp the images and store them as temporary exposures with makeCoaddTempExp
.
Finally, once we have warped images, we perform coaddition with assembleCoadd.py
the configuration field specifying the WCS Projection can be: - STG: stereographic projection - MOL: Molleweide's projection - TAN: tangent-plane projection
Part V runs the following command-line tasks:
detectCoaddSources.py $DATADIR --rerun coadd:coaddPhot \
--id filter=HSC-R tract=0 patch=0,0^0,1^0,2^1,0^1,1^1,2^2,0^2,1^2,2
detectCoaddSources.py $DATADIR --rerun coaddPhot \
--id filter=HSC-I tract=0 patch=0,0^0,1^0,2^1,0^1,1^1,2^2,0^2,1^2,2
mergeCoaddDetections.py $DATADIR --rerun coaddPhot --id filter=HSC-R^HSC-I
measureCoaddSources.py $DATADIR --rerun coaddPhot --id filter=HSC-R
measureCoaddSources.py $DATADIR --rerun coaddPhot --id filter=HSC-I
mergeCoaddMeasurements.py $DATADIR --rerun coaddPhot --id filter=HSC-R^HSC-I
forcedPhotCoadd.py $DATADIR --rerun coaddPhot:coaddForcedPhot --id filter=HSC-R
forcedPhotCoadd.py $DATADIR --rerun coaddForcedPhot --id filter=HSC-I
forcedPhotCcd.py $DATADIR --rerun coaddPhot:ccdForcedPhot --id filter=HSC-R \
--clobber-config --configfile=/project/shared/data/ci_hsc/forcedPhotCcdConfig.py &> ccd_r.txt
forcedPhotCcd.py $DATADIR --rerun ccdForcedPhot --id filter=HSC-I \
--clobber-config --configfile=/project/shared/data/ci_hsc/forcedPhotCcdConfig.py &> ccd_i.txt
The first pair of commands does source detection on the coadds in each band.
The source catalogs are then merged so that we can measure photometry for a consistent table of sources across filters.
measureCoaddSources.py
does deblending with this complete catalog and measures regular photometry.
We run mergeCoaddMeasurements.py
to write a table that identifies the reference filter that has the best position measurement for each source in the tables you created with measureCoaddSources.py
These accurate positions are used for forced photometry with forcedPhotCoadd.py
as well as forcedPhotCcd.py
https://pipelines.lsst.io/getting-started/multiband-analysis.html
We now turn to making plots of the photometry we've done. To start, we access the sources identified from the coadd images.
butler_coadd = dafPersist.Butler(inputs=DATADIR + 'rerun/coaddForcedPhot/')
Grab their measured forced photometry.
# We pass a datasetRefOrType and a DataId (dict) to the butler
# datasetRefOrType : deepCoadd_forced_src
# see all options at
# /opt/lsst/software/stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/obs_subaru/16.0+1/python/lsst/obs/hsc
rSources = butler_coadd.get('deepCoadd_forced_src', {'filter': 'HSC-R', 'tract': 0, 'patch': '1,1'})
iSources = butler_coadd.get('deepCoadd_forced_src', {'filter': 'HSC-I', 'tract': 0, 'patch': '1,1'})
print('{} sources with forced photometry measured from coadds'.format(len(rSources)))
Discard sources with negative fluxes and convert fluxes to magnitudes.
rCoaddCalib = butler_coadd.get('deepCoadd_calexp_calib', {'filter': 'HSC-R', 'tract': 0, 'patch': '1,1'})
iCoaddCalib = butler_coadd.get('deepCoadd_calexp_calib', {'filter': 'HSC-I', 'tract': 0, 'patch': '1,1'})
rCoaddCalib.setThrowOnNegativeFlux(False)
iCoaddCalib.setThrowOnNegativeFlux(False)
rMags_coadd = rCoaddCalib.getMagnitude(rSources['base_PsfFlux_flux'])
iMags_coadd = iCoaddCalib.getMagnitude(iSources['base_PsfFlux_flux'])
Make selection from catalog for stars only.
deblended = rSources['deblend_nChild'] == 0
refTable = butler_coadd.get('deepCoadd_ref', {'filter': 'HSC-R^HSC-I', 'tract': 0, 'patch': '1,1'})
inInnerRegions = refTable['detect_isPatchInner'] & refTable['detect_isTractInner'] # define inner regions
isSkyObject = refTable['merge_peak_sky'] # reject sky objects
isPrimary = refTable['detect_isPrimary']
isStellar = iSources['base_ClassificationExtendedness_value'] < 1.
isGoodFlux = ~iSources['base_PsfFlux_flag']
selected = isPrimary & isStellar & isGoodFlux
Make color-magnitude diagram.
plt.style.use('seaborn-notebook')
plt.figure(1, figsize=(4, 4), dpi=140)
plt.title('Coadd Forced Photometry (Stars)')
plt.scatter(rMags_coadd[selected] - iMags_coadd[selected],
iMags_coadd[selected],
edgecolors='None', s=2, c='k')
plt.xlim(-0.5, 3)
plt.ylim(25, 14)
plt.xlabel('$r-i$')
plt.ylabel('$i$')
plt.subplots_adjust(left=0.125, bottom=0.1)
plt.show()
Instantiate Butler with forced photometry measured for individual exposures
butler_ccd = dafPersist.Butler(inputs=DATADIR + 'rerun/ccdForcedPhot/')
In order to associate individual visits with the MJD of the exposure, we go back to the raw images stored in the Butler repository. This may be replaceable with cleaner code that takes advantage of some Butler feature that accomplishes the same goal.
Note: this should be improved
# get visitInfo (for exposure time, obs Date, coords, info on observatory)
# also look into butler registry butler.query_metadata()
# associate each visit ID with an MJD
# store in lookup hashtable
visit_to_mjd = {}
raw_files = glob.glob(CI_HSC + 'raw/HSCA*.fits')
for infile in raw_files:
visit_id = infile[len(CI_HSC) + len("raw/HSCA"):-7]
hdulist = fitsio.open(infile)
try:
mjd = hdulist[1].header['MJD']
except:
mjd = hdulist[0].header['MJD']
visit_to_mjd[visit_id] = mjd
Doing forced photometry on individual exposures saves the source tables in different files. Here we query the Butler for all the data and store the tables together. This may be replaceable with cleaner code that takes advantage of some Butler feature that accomplishes the same goal.
Note: this should be improved
data_id_fields = [('filter', str), ('pointing', int), ('visit', int),
('ccd', int), ('field', str), ('dateObs', str),
('taiObs', str), ('expTime', float), ('tract', int)]
i_tables = []
r_tables = []
for line in open(DATADIR + 'data_ids.txt'):
fields = line.split(",")
data_id_dict = {data_id_fields[i][0] : data_id_fields[i][1](fields[i].split(':')[1]) for i in range(len(fields))}
print(data_id_dict)
sources = butler_ccd.get('forced_src', data_id_dict)
source_table = sources.asAstropy().to_pandas()
source_table['visit'] = fields[2].split(':')[1]
source_table['mjd'] = [visit_to_mjd[key] if key in visit_to_mjd else 56598.2 for key in source_table['visit'] ] # TODO: this is problematic. fix this
if fields[0] == 'filter:HSC-R':
r_tables.append(source_table)
elif fields[0] == 'filter:HSC-I':
i_tables.append(source_table)
else:
print('Failed to read filter')
plt.style.use('seaborn-notebook')
fig, axarr = plt.subplots(1, 2, figsize=(8, 4), dpi=140)
fig.suptitle('Forced Photometry')
axarr[0].set_title('Coadd (Stars Only)')
axarr[0].scatter(rMags_coadd[selected] - iMags_coadd[selected],
iMags_coadd[selected],
edgecolors='None', s=2, c='k')
axarr[0].set_xlim(-0.5, 3)
axarr[0].set_ylim(25, 14)
axarr[0].set_xlabel('$r-i$')
axarr[0].set_ylabel('$i$')
# datasetRefOrType : forced_src
# see all options at
# /opt/lsst/software/stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/obs_subaru/16.0+1/python/lsst/obs/hsc
iDataId = {'filter': 'HSC-I', 'pointing': 671, 'visit': 903986, 'ccd': 16, 'field': 'STRIPE82L', 'dateObs': '2013-11-02',
'taiObs': '2013-11-02', 'expTime': 30.0, 'tract': 0, 'patch' : '1,1'}
rDataId = {'filter': 'HSC-R', 'pointing': 533, 'visit': 903334, 'ccd': 16, 'field': 'STRIPE82L', 'dateObs': '2013-06-17',
'taiObs': '2013-06-17', 'expTime': 30.0, 'tract': 0, 'patch' : '1,1'}
iSources = butler_ccd.get('forced_src', iDataId)
rSources = butler_ccd.get('forced_src', rDataId)
iCcdCalib = butler_ccd.get('calexp_calib', iDataId)
rCcdCalib = butler_ccd.get('calexp_calib', rDataId)
rMags_ccd = rCcdCalib.getMagnitude(rSources['base_PsfFlux_flux'])
iMags_ccd = iCcdCalib.getMagnitude(iSources['base_PsfFlux_flux'])
axarr[1].set_title('Single Exposure (All Sources)')
plt.scatter(rMags_ccd - iMags_ccd, iMags_ccd, edgecolors='None', s=2, c='k')
axarr[1].set_xlim(-0.5, 3)
axarr[1].set_ylim(25, 14)
axarr[1].set_xlabel('$r-i$')
axarr[1].set_ylabel('$i$')
plt.subplots_adjust(left=0.125, bottom=0.1)
plt.show()
Now we concatenate the concatenate the measured sources into two tables (one for each filter). Then group by object ID to draw a five epoch light curve.
iSources = pd.concat(i_tables)
rSources = pd.concat(r_tables)
iGrouped = iSources.groupby('objectId')
rGrouped = rSources.groupby('objectId')
print('{} i band objects, {} measurements'.format(len(np.unique(iSources['objectId'])), len(iSources['objectId'])))
print('{} r band objects, {} measurements'.format(len(np.unique(rSources['objectId'])), len(rSources['objectId'])))
objids = [name for name, group in iGrouped if len(group) == 5]
print('{} objects with 5 epocs'.format(len(objids)))
Choose an object to draw a light curve.
obj = objids[1]
print('e.g. objectId:', obj, '(showing select rows from table)')
iSources[iSources['objectId'] == obj][['id','coord_ra','coord_dec','objectId','base_PsfFlux_flux','base_PsfFlux_fluxSigma', 'visit','mjd']]
# http://slittlefair.staff.shef.ac.uk/teaching/phy217/lectures/stats/L18/index.html
yerr = 1.09 * iSources[iSources['objectId'] == obj]['base_PsfFlux_fluxSigma'].values / iSources[iSources['objectId'] == obj]['base_PsfFlux_flux'].values
yerr
plt.style.use('seaborn-notebook')
plt.figure(3, figsize=(4, 4), dpi=140)
plt.title('objectId: {}'.format(obj))
plt.errorbar(iSources[iSources['objectId'] == obj]['mjd'].values,
iCcdCalib.getMagnitude(iSources[iSources['objectId'] == obj]['base_PsfFlux_flux'].values),
yerr=yerr,
markersize=6, color='k', fmt='.')
plt.ylabel('$i$' + ' mag')
plt.xlabel('MJD')
plt.subplots_adjust(left=0.125, bottom=0.1)
ax = plt.gca()
ax.ticklabel_format(useOffset=56598, style='plain', axis='x', useMathText=True)
ax.invert_yaxis()
plt.show()
As the plots above show, we have completed an end-to-end processing of the ci_hsc
test dataset, following the steps in the http://pipelines.lsst.io "getting started" tutorial. We saw how the command line tasks take care of all teh book-keeping, and how they are configured and run.