This is intended to walk you through the processing pipeline on jupyterlab. It builds on the first two hands-on tutorials in the LSST "Getting started" tutorial series. It is intended for anyone getting started with using the LSST Science Pipelines for data processing.
The goal of this tutorial is to setup a Butler for a simulated LSST data set and to run the processCCD.py
pipeline task to produced reduced images.
Sample data for this tutorial comes from the twinkles
LSST simulation and is available in a shared directory on jupyterlab
. We will make a copy of the input data in our current directory:
!if [ ! -d DATA ]; then cp -r /project/shared/data/Twinkles_subset/input_data_v2 DATA; fi
Inside the data directory you'll see a directory structure that looks like this
!ls -lh DATA/
The Butler uses a mapper to find and organize data in a format specific to each camera. Here we're using lsst.obs.lsstSim.LsstSimMapper
mapper for the Twinkles simulated data:
cat DATA/_mapper
All of the relavent images and calibrations have already been ingested into the Butler for this data set.
We'll now process individual raw LSST simulated images in the Butler DATA
repository into calibrated exposures. We’ll use the processCcd.py
command-line task to remove instrumental signatures with dark, bias and flat field calibration images. processCcd.py
will also use the reference catalog to establish a preliminary WCS and photometric zeropoint solution.
First we'll examine the set of exposures available in the Twinkles data set using the Butler
Now we'll do a similar thing using the processEimageTask
from the LSST pipeline. There is a bit of ugliness here because the processEimage.py
command line script is only python2 compatible so we need to parse the arguments through the API. This has the nasty habit of trying to exit after the args.
from lsst.obs.lsstSim.processEimage import ProcessEimageTask
args = 'DATA --rerun process-eimage --id filter=r --show data'
ProcessEimageTask.parseAndRun(args=args.split())
# BUG: the command above exits early, due to a namespace problem:
# /opt/lsst/software/stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/pipe_base/15.0/python/lsst/pipe/base/argumentParser.py in parse_args(self, config, args, log, override)
# 628
# 629 if namespace.show and "run" not in namespace.show:
# --> 630 sys.exit(0)
The important arguments here are --id
and --show data
.
The --id
argument allows you to select datasets to process by their data IDs. Data IDs describe individual datasets in the Butler repository. Datasets also have types, and each command-line task will only process data of certain types. In this case, processEimage.py
will processes raw simulated e-images (need more description of e-images).
In the above command, the --id filter=r
argument selects data from the r filter. Specifying --id
without any arguments acts as a wildcard that selects all raw-type data in the repository.
The --show data
argument puts processEimage.py
into a dry-run mode that prints a list of data IDs to standard output that would be processed according to the --id
argument rather than actually processing the data.
Notice the keys that describe each data ID, such as the visit (exposure identifier), raft (identifies a specific LSST camera raft), sensor (identifies an individual ccd on a raft) and filter, among others. With these keys you can select exactly what data you want to process.
Next we perform the same task directly with the Butler:
import lsst.daf.persistence as dafPersist
butler = dafPersist.Butler(inputs='DATA')
butler.queryMetadata('eimage', ['visit', 'raft', 'sensor','filter'], dataId={'filter': 'r'})
Now we'll move on to actually process some of the Twinkles data. To do this, we'll remove the --show data
argument.
args = 'DATA --rerun process-eimage --id filter=r --show data'
# The command below also exits early - see the error message above.
ProcessEimageTask.parseAndRun(args=args.split())