Notebook

Single Exposure Processing¶

This is intended to walk you through the processing pipeline on jupyterlab. It builds on the first two hands-on tutorials in the LSST "Getting started" tutorial series. It is intended for anyone getting started with using the LSST Science Pipelines for data processing.

The goal of this tutorial is to setup a Butler for a simulated LSST data set and to run the processCCD.py pipeline task to produced reduced images.

Setting up the data repository¶

Sample data for this tutorial comes from the twinkles LSST simulation and is available in a shared directory on jupyterlab. We will make a copy of the input data in our current directory:

In [ ]:

!if [ ! -d DATA ]; then cp -r /project/shared/data/Twinkles_subset/input_data_v2 DATA; fi

Inside the data directory you'll see a directory structure that looks like this

In [ ]:

!ls -lh DATA/

The Butler uses a mapper to find and organize data in a format specific to each camera. Here we're using lsst.obs.lsstSim.LsstSimMapper mapper for the Twinkles simulated data:

In [ ]:

cat DATA/_mapper

All of the relavent images and calibrations have already been ingested into the Butler for this data set.

Reviewing what data will be processed¶

We'll now process individual raw LSST simulated images in the Butler DATA repository into calibrated exposures. We’ll use the processCcd.py command-line task to remove instrumental signatures with dark, bias and flat field calibration images. processCcd.py will also use the reference catalog to establish a preliminary WCS and photometric zeropoint solution.

First we'll examine the set of exposures available in the Twinkles data set using the Butler

Now we'll do a similar thing using the processEimageTask from the LSST pipeline. There is a bit of ugliness here because the processEimage.py command line script is only python2 compatible so we need to parse the arguments through the API. This has the nasty habit of trying to exit after the args.

In [ ]:

from lsst.obs.lsstSim.processEimage import ProcessEimageTask

In [ ]:

args = 'DATA --rerun process-eimage  --id filter=r --show data'
ProcessEimageTask.parseAndRun(args=args.split())

# BUG: the command above exits early, due to a namespace problem:
# /opt/lsst/software/stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/pipe_base/15.0/python/lsst/pipe/base/argumentParser.py in parse_args(self, config, args, log, override)
#     628 
#     629         if namespace.show and "run" not in namespace.show:
# --> 630             sys.exit(0)

The important arguments here are --id and --show data.

The --id argument allows you to select datasets to process by their data IDs. Data IDs describe individual datasets in the Butler repository. Datasets also have types, and each command-line task will only process data of certain types. In this case, processEimage.py will processes raw simulated e-images (need more description of e-images).

In the above command, the --id filter=r argument selects data from the r filter. Specifying --id without any arguments acts as a wildcard that selects all raw-type data in the repository.

The --show data argument puts processEimage.py into a dry-run mode that prints a list of data IDs to standard output that would be processed according to the --id argument rather than actually processing the data.

Notice the keys that describe each data ID, such as the visit (exposure identifier), raft (identifies a specific LSST camera raft), sensor (identifies an individual ccd on a raft) and filter, among others. With these keys you can select exactly what data you want to process.

Next we perform the same task directly with the Butler:

In [ ]:

import lsst.daf.persistence as dafPersist

In [ ]:

butler = dafPersist.Butler(inputs='DATA')
butler.queryMetadata('eimage', ['visit', 'raft', 'sensor','filter'], dataId={'filter': 'r'})

Processing data¶

Now we'll move on to actually process some of the Twinkles data. To do this, we'll remove the --show data argument.

In [ ]:

args = 'DATA --rerun process-eimage  --id filter=r --show data'
# The command below also exits early - see the error message above.
ProcessEimageTask.parseAndRun(args=args.split())