Notebook

HSC Re-Run: Making Forced Photometry Light Curves from Scratch¶

Owners: Justin Myles (@jtmyles), Phil Marshall (@drphilmarshall)
Last Verified to Run: 2018-09-21
Verified Stack Release: 16.0

This project addresses issue #63: HSC Re-run

This notebook demonstrates the pipeline described in the LSST Science Piplines data processing tutorial, from ingesting images (using the obs_subaru package) through image processing, coaddition, source detection and object measurement all the way through to measuring forced photometry light curves in a small patch of the HSC sky (in the ci_hsc repository). It does this by calling a bash script, having first identified a minimal data set for demonstration purposes.

Learning Objectives:¶

After working through and studying this notebook you should be able to understand how to use the DRP pipeline from image visualization through to a forced photometry light curve. Specific learning objectives include:

How the command line tasks are configured, executed and linked together in a complete pipeline.
Configuring and executing pipeline tasks in python as well as on the command line.
The actual sequence of steps involved in the DRP pipeline.

Logistics¶

This notebook is intended to be runnable on lsst-lsp-stable.ncsa.illinois.edu from a local git clone of https://github.com/LSSTScienceCollaborations/StackClub.

Set Up¶

In [ ]:

import os
import re
import sys
import glob
import numpy as np
import pandas as pd
import astropy.io.fits as fitsio
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter, FormatStrFormatter
%matplotlib inline

import eups.setupcmd
import lsst.daf.persistence as dafPersist

HOME = os.environ['HOME']
DATAREPO = "{}/repositories/ci_hsc/".format(HOME)
DATADIR = "{}/DATA/".format(HOME)
CI_HSC = "/project/shared/data/ci_hsc/"
os.system("mkdir -p {}".format(DATADIR));

Pipeline Preview¶

The pipeline described in the LSST Science Piplines data processing tutorial contains a complete set of command line tasks that can be assembled into an end-to-end data reduction pipeline script. Let's see what this script looks like.

In [ ]:

! cat Re-RunHSC.sh

We'll come back to each step in turn throughout the rest of this notebook.

Part I: Interacting with data. Introduction to the Butler¶

https://pipelines.lsst.io/getting-started/data-setup.html

Part I runs the following command-line tasks

source /opt/lsst/software/stack/loadLSST.bash
eups list lsst_distrib
setup lsst_distrib

setup -j -r /project/shared/data/ci_hsc
echo lsst.obs.hsc.HscMapper > $DATADIR/_mapper

ingestImages.py $DATADIR $CI_HSC_DIR/raw/*.fits --mode=link
installTransmissionCurves.py $DATADIR

ln -s $CI_HSC_DIR/CALIB/ $DATADIR/CALIB
mkdir -p $DATADIR/ref_cats
ln -s $CI_HSC_DIR/ps1_pv3_3pi_20170110 $DATADIR/ref_cats/ps1_pv3_3pi_20170110

In summary, these lines establish a directory with read/write permission where all data products made by this tutorial will be stored. In the underlying Python, this repository is represented by an instance of the Butler class (which inherits directly from Object).

The first thing we do in our Butler repository is write a _mapper file, which tells the Butler which instrument was used to collect the data stored in the Butler. It needs this file to find and organize data in a format specific to the appropriate camera. This illustrates the Butler's motivating concept: the LSST DM Stack should be capable of interacting with data collected from a variety of instruments. The Butler facilitates this process by abstracting the data read/write process.

Second, we use the ingestImages.py task to organize the data in a format specific to HSC. This task reads the raw FITS files and stores the schema associated with the data in the Butler repository.

Third, we use the installTransmissionCurves.py task to install transmission curves for the data and link the calibration files associated with the raw data to the Butler.

Last, we link a reference catalog to the Butler for use with astrometry.

Doing some these steps in Python might look something like this:

In [ ]:

"""
#!setup -j -r /home/jmyles/repositories/ci_hsc

setup = eups.setupcmd.EupsSetup(["-j","-r", DATAREPO])
status = setup.run()
print('setup exited with status {}'.format(status))

with open(DATADIR + "_mapper", "w") as f:
    f.write("lsst.obs.hsc.HscMapper")
    
#!installTransmissionCurves.py /home/jmyles/DATA

from lsst.obs.hsc import makeTransmissionCurves, HscMapper
from lsst.daf.persistence import Butler

butler = Butler(outputs={'root': datadir, 'mode': 'rw', 'mapper': HscMapper})

for start, nested in makeTransmissionCurves.getFilterTransmission().items():
    for name, curve in nested.items():
        if curve is not None:
            butler.put(curve, "transmission_filter", filter=name)
for start, nested in makeTransmissionCurves.getSensorTransmission().items():
    for ccd, curve in nested.items():
        if curve is not None:
            butler.put(curve, "transmission_sensor", ccd=ccd)
for start, curve in makeTransmissionCurves.getOpticsTransmission().items():
    if curve is not None:
        butler.put(curve, "transmission_optics")
for start, curve in makeTransmissionCurves.getAtmosphereTransmission().items():
    if curve is not None:
        butler.put(curve, "transmission_atmosphere")
        
# ingest calibration images into Butler repo
os.system("ln -s {} {}".format(datarepo + "CALIB/", datadir + "CALIB"))

# ingest reference catalog into Butler repo
os.system("mkdir -p {}".format(DATADIR + "ref_cats"))
os.system("ln -s {}ps1_pv3_3pi_20170110 {}ref_cats/ps1_pv3_3pi_20170110".format(DATAREPO, DATADIR))        
"""

Part 2: Calibrating single frames¶

https://pipelines.lsst.io/getting-started/processccd.html

Part II runs the following command-line task:

processCcd.py $DATADIR --rerun processCcdOutputs --id

This applies photometric and astrometric calibrations to the raw images.

The id flag allows you to select data by data ID: an unspecified id selects all raw data. Other example arguments are raw, filter, visit, ccd, and field.

All command-line tasks write output datasets to a Butler repository. The --rerun flag here tells the tasks to write to processCcdOutputs.

For information on further unpacking, see Alex Drlica-Wagner's Pipeline Tasks notebook on unpacking command-line tasks.

In [ ]:

from stackclub import where_is

from lsst.pipe.tasks.processCcd import ProcessCcdTask, ProcessCcdConfig

processCcdConfig = ProcessCcdConfig()
processCcdTaskInstance = ProcessCcdTask(butler=butler)

where_is(processCcdTaskInstance, in_the="source")

ProcessCcdTask.parseAndRun()

In [ ]:

"""
# Running this in python might look something like this:
from stackclub import where_is
processCcd.py
from lsst.pipe.tasks.processCcd import ProcessCcdTask

processCcdConfig = ProcessCcdConfig()
processCcdTaskInstance = ProcessCcdTask(butler=butler)

where_is(processCcdTaskInstance, in_the="source")

ProcessCcdTask.parseAndRun()
"""

Part 3: Displaying exposures and source tables output by processCcd.py¶

https://pipelines.lsst.io/getting-started/display.html

This part of the tutorial is omitted.

Part 4: Coadding images¶

https://pipelines.lsst.io/getting-started/coaddition.html

Part IV runs the following command-line tasks:

makeDiscreteSkyMap.py $DATADIR --id --rerun processCcdOutputs:coadd --config skyMap.projection="TAN"

makeCoaddTempExp.py $DATADIR --rerun coadd \
    --selectId filter=HSC-R \
    --id filter=HSC-R tract=0 patch=0,0^0,1^0,2^1,0^1,1^1,2^2,0^2,1^2,2 \
    --config doApplyUberCal=False doApplySkyCorr=False

makeCoaddTempExp.py $DATADIR --rerun coadd \
    --selectId filter=HSC-I \
    --id filter=HSC-I tract=0 patch=0,0^0,1^0,2^1,0^1,1^1,2^2,0^2,1^2,2 \
    --config doApplyUberCal=False doApplySkyCorr=False

assembleCoadd.py $DATADIR --rerun coadd \
    --selectId filter=HSC-R \
    --id filter=HSC-R tract=0 patch=0,0^0,1^0,2^1,0^1,1^1,2^2,0^2,1^2,2

assembleCoadd.py $DATADIR --rerun coadd \
    --selectId filter=HSC-I \
    --id filter=HSC-I tract=0 patch=0,0^0,1^0,2^1,0^1,1^1,2^2,0^2,1^2,2

Since we want the deepest possible image for source detection in this example, we construct coadded images of the exposures. A sky map is a tiling of the celestial sphere. It is composed of one or more tracts, where a tract is in turn composed of one or more overlapping patches of sky which share a single WCS.

First, we define a skymap with makeDiscreteSkyMap.py so that we can warp all of the exposure to fit on a single coordinate system. Second, we warp the images and store them as temporary exposures with makeCoaddTempExp. Finally, once we have warped images, we perform coaddition with assembleCoadd.py

the configuration field specifying the WCS Projection can be: - STG: stereographic projection - MOL: Molleweide's projection - TAN: tangent-plane projection

Part 5: Source detection¶

Part V runs the following command-line tasks:

detectCoaddSources.py $DATADIR --rerun coadd:coaddPhot \
    --id filter=HSC-R tract=0 patch=0,0^0,1^0,2^1,0^1,1^1,2^2,0^2,1^2,2

detectCoaddSources.py $DATADIR --rerun coaddPhot \
    --id filter=HSC-I tract=0 patch=0,0^0,1^0,2^1,0^1,1^1,2^2,0^2,1^2,2

mergeCoaddDetections.py $DATADIR --rerun coaddPhot --id filter=HSC-R^HSC-I

measureCoaddSources.py $DATADIR --rerun coaddPhot --id filter=HSC-R
measureCoaddSources.py $DATADIR --rerun coaddPhot --id filter=HSC-I

mergeCoaddMeasurements.py $DATADIR --rerun coaddPhot --id filter=HSC-R^HSC-I

forcedPhotCoadd.py $DATADIR --rerun coaddPhot:coaddForcedPhot --id filter=HSC-R
forcedPhotCoadd.py $DATADIR --rerun coaddForcedPhot --id filter=HSC-I

forcedPhotCcd.py $DATADIR --rerun coaddPhot:ccdForcedPhot --id filter=HSC-R \ 
--clobber-config --configfile=/project/shared/data/ci_hsc/forcedPhotCcdConfig.py &> ccd_r.txt

forcedPhotCcd.py $DATADIR --rerun ccdForcedPhot --id filter=HSC-I \ 
--clobber-config --configfile=/project/shared/data/ci_hsc/forcedPhotCcdConfig.py &> ccd_i.txt

The first pair of commands does source detection on the coadds in each band.

The source catalogs are then merged so that we can measure photometry for a consistent table of sources across filters.

measureCoaddSources.py does deblending with this complete catalog and measures regular photometry.

We run mergeCoaddMeasurements.py to write a table that identifies the reference filter that has the best position measurement for each source in the tables you created with measureCoaddSources.py

These accurate positions are used for forced photometry with forcedPhotCoadd.py as well as forcedPhotCcd.py

Part 6: Multi-band catalog analysis¶

https://pipelines.lsst.io/getting-started/multiband-analysis.html

We now turn to making plots of the photometry we've done. To start, we access the sources identified from the coadd images.

In [ ]:

butler_coadd = dafPersist.Butler(inputs=DATADIR + 'rerun/coaddForcedPhot/')

Grab their measured forced photometry.

In [ ]:

# We pass a datasetRefOrType and a DataId (dict) to the butler
# datasetRefOrType : deepCoadd_forced_src
# see all options at
# /opt/lsst/software/stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/obs_subaru/16.0+1/python/lsst/obs/hsc

rSources = butler_coadd.get('deepCoadd_forced_src', {'filter': 'HSC-R', 'tract': 0, 'patch': '1,1'})
iSources = butler_coadd.get('deepCoadd_forced_src', {'filter': 'HSC-I', 'tract': 0, 'patch': '1,1'})
print('{} sources with forced photometry measured from coadds'.format(len(rSources)))

Discard sources with negative fluxes and convert fluxes to magnitudes.

In [ ]:

rCoaddCalib = butler_coadd.get('deepCoadd_calexp_calib',  {'filter': 'HSC-R', 'tract': 0, 'patch': '1,1'})
iCoaddCalib = butler_coadd.get('deepCoadd_calexp_calib',  {'filter': 'HSC-I', 'tract': 0, 'patch': '1,1'})

rCoaddCalib.setThrowOnNegativeFlux(False)
iCoaddCalib.setThrowOnNegativeFlux(False)

rMags_coadd = rCoaddCalib.getMagnitude(rSources['base_PsfFlux_flux'])
iMags_coadd = iCoaddCalib.getMagnitude(iSources['base_PsfFlux_flux'])

Make selection from catalog for stars only.

In [ ]:

deblended = rSources['deblend_nChild'] == 0

refTable = butler_coadd.get('deepCoadd_ref', {'filter': 'HSC-R^HSC-I', 'tract': 0, 'patch': '1,1'})
inInnerRegions = refTable['detect_isPatchInner'] & refTable['detect_isTractInner'] # define inner regions
isSkyObject = refTable['merge_peak_sky'] # reject sky objects
isPrimary = refTable['detect_isPrimary']

isStellar = iSources['base_ClassificationExtendedness_value'] < 1.
isGoodFlux = ~iSources['base_PsfFlux_flag']
selected = isPrimary & isStellar & isGoodFlux

Make color-magnitude diagram.

In [ ]:

plt.style.use('seaborn-notebook')
plt.figure(1, figsize=(4, 4), dpi=140)
plt.title('Coadd Forced Photometry (Stars)')
plt.scatter(rMags_coadd[selected] - iMags_coadd[selected],
            iMags_coadd[selected],
            edgecolors='None', s=2, c='k')

plt.xlim(-0.5, 3)
plt.ylim(25, 14)
plt.xlabel('$r-i$')
plt.ylabel('$i$')
plt.subplots_adjust(left=0.125, bottom=0.1)
plt.show()

Instantiate Butler with forced photometry measured for individual exposures

In [ ]:

butler_ccd = dafPersist.Butler(inputs=DATADIR + 'rerun/ccdForcedPhot/')

In order to associate individual visits with the MJD of the exposure, we go back to the raw images stored in the Butler repository. This may be replaceable with cleaner code that takes advantage of some Butler feature that accomplishes the same goal.

Note: this should be improved

In [ ]:

# get visitInfo (for exposure time, obs Date, coords, info on observatory)
# also look into butler registry butler.query_metadata()

# associate each visit ID with an MJD
# store in lookup hashtable
visit_to_mjd = {}   

raw_files = glob.glob(CI_HSC + 'raw/HSCA*.fits')

for infile in raw_files:
    visit_id = infile[len(CI_HSC) + len("raw/HSCA"):-7]
    hdulist = fitsio.open(infile)
    try:
        mjd = hdulist[1].header['MJD']
    except:
        mjd = hdulist[0].header['MJD']
    visit_to_mjd[visit_id] = mjd

Doing forced photometry on individual exposures saves the source tables in different files. Here we query the Butler for all the data and store the tables together. This may be replaceable with cleaner code that takes advantage of some Butler feature that accomplishes the same goal.

Note: this should be improved

In [ ]:

data_id_fields = [('filter', str), ('pointing', int), ('visit', int), 
                  ('ccd', int), ('field', str), ('dateObs', str), 
                  ('taiObs', str), ('expTime', float), ('tract', int)]

i_tables = []
r_tables = []

for line in open(DATADIR + 'data_ids.txt'):
    fields = line.split(",")
    
    data_id_dict = {data_id_fields[i][0] : data_id_fields[i][1](fields[i].split(':')[1]) for i in range(len(fields))}
    print(data_id_dict)
    
    sources = butler_ccd.get('forced_src', data_id_dict)
    source_table = sources.asAstropy().to_pandas()
    source_table['visit'] = fields[2].split(':')[1]
    source_table['mjd'] = [visit_to_mjd[key] if key in visit_to_mjd else 56598.2 for key in source_table['visit'] ] # TODO: this is problematic. fix this

    if fields[0] == 'filter:HSC-R':
        r_tables.append(source_table)
    elif fields[0] == 'filter:HSC-I':
        i_tables.append(source_table)
    else:
        print('Failed to read filter')

In [ ]:

plt.style.use('seaborn-notebook')
fig, axarr = plt.subplots(1, 2, figsize=(8, 4), dpi=140)

fig.suptitle('Forced Photometry')

axarr[0].set_title('Coadd (Stars Only)')
axarr[0].scatter(rMags_coadd[selected] - iMags_coadd[selected],
                 iMags_coadd[selected],
                 edgecolors='None', s=2, c='k')

axarr[0].set_xlim(-0.5, 3)
axarr[0].set_ylim(25, 14)
axarr[0].set_xlabel('$r-i$')
axarr[0].set_ylabel('$i$')

# datasetRefOrType : forced_src
# see all options at
# /opt/lsst/software/stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/obs_subaru/16.0+1/python/lsst/obs/hsc
iDataId = {'filter': 'HSC-I', 'pointing': 671, 'visit': 903986, 'ccd': 16, 'field': 'STRIPE82L', 'dateObs': '2013-11-02', 
           'taiObs': '2013-11-02', 'expTime': 30.0, 'tract': 0, 'patch' : '1,1'}
rDataId = {'filter': 'HSC-R', 'pointing': 533, 'visit': 903334, 'ccd': 16, 'field': 'STRIPE82L', 'dateObs': '2013-06-17', 
          'taiObs': '2013-06-17', 'expTime': 30.0, 'tract': 0, 'patch' : '1,1'}

iSources = butler_ccd.get('forced_src', iDataId)
rSources = butler_ccd.get('forced_src', rDataId)

iCcdCalib = butler_ccd.get('calexp_calib', iDataId)
rCcdCalib = butler_ccd.get('calexp_calib', rDataId)

rMags_ccd = rCcdCalib.getMagnitude(rSources['base_PsfFlux_flux'])
iMags_ccd = iCcdCalib.getMagnitude(iSources['base_PsfFlux_flux'])

axarr[1].set_title('Single Exposure (All Sources)')
plt.scatter(rMags_ccd - iMags_ccd, iMags_ccd, edgecolors='None', s=2, c='k')
axarr[1].set_xlim(-0.5, 3)
axarr[1].set_ylim(25, 14)
axarr[1].set_xlabel('$r-i$')
axarr[1].set_ylabel('$i$')
plt.subplots_adjust(left=0.125, bottom=0.1)

plt.show()

Now we concatenate the concatenate the measured sources into two tables (one for each filter). Then group by object ID to draw a five epoch light curve.

In [ ]:

iSources = pd.concat(i_tables)    
rSources = pd.concat(r_tables)

iGrouped = iSources.groupby('objectId')
rGrouped = rSources.groupby('objectId')

print('{} i band objects, {} measurements'.format(len(np.unique(iSources['objectId'])), len(iSources['objectId'])))
print('{} r band objects, {} measurements'.format(len(np.unique(rSources['objectId'])), len(rSources['objectId'])))

objids = [name for name, group in iGrouped if len(group) == 5]
print('{} objects with 5 epocs'.format(len(objids)))

Choose an object to draw a light curve.

In [ ]:

obj = objids[1]
print('e.g. objectId:', obj, '(showing select rows from table)')
iSources[iSources['objectId'] == obj][['id','coord_ra','coord_dec','objectId','base_PsfFlux_flux','base_PsfFlux_fluxSigma', 'visit','mjd']]

In [ ]:

# http://slittlefair.staff.shef.ac.uk/teaching/phy217/lectures/stats/L18/index.html
yerr = 1.09 * iSources[iSources['objectId'] == obj]['base_PsfFlux_fluxSigma'].values / iSources[iSources['objectId'] == obj]['base_PsfFlux_flux'].values
yerr

In [ ]:

plt.style.use('seaborn-notebook')
plt.figure(3, figsize=(4, 4), dpi=140)
plt.title('objectId: {}'.format(obj))
plt.errorbar(iSources[iSources['objectId'] == obj]['mjd'].values, 
            iCcdCalib.getMagnitude(iSources[iSources['objectId'] == obj]['base_PsfFlux_flux'].values),
            yerr=yerr,
            markersize=6, color='k', fmt='.')

plt.ylabel('$i$' + ' mag')
plt.xlabel('MJD')

plt.subplots_adjust(left=0.125, bottom=0.1)
ax = plt.gca()
ax.ticklabel_format(useOffset=56598, style='plain', axis='x', useMathText=True)
ax.invert_yaxis()
plt.show()

Summary¶

As the plots above show, we have completed an end-to-end processing of the ci_hsc test dataset, following the steps in the http://pipelines.lsst.io "getting started" tutorial. We saw how the command line tasks take care of all teh book-keeping, and how they are configured and run.