#!/usr/bin/env python # coding: utf-8 # # Using the LSST Stack Multiband Deblender #
Authors(s): **Fred Moolekamp** ([@fred3m](https://github.com/LSSTScienceCollaborations/StackClub/issues/new?body=@fred3m)) #
Maintainer(s): **Alex Drlica-Wagner** ([@kadrlica](https://github.com/LSSTScienceCollaborations/StackClub/issues/new?body=@kadrlica)) #
Level: **Intermediate** #
Last Verified to Run: **2021-09-03** #
Verified Stack Release: **w_2021_33** # # This tutorial is designed to illustrate how to execute the multiband deblender (*scarlet*) in the LSST stack. This includes a brief introduction to LSST stack objects including: # # 1. Geometry classes from `lsst.geom`, such as points and boxes. # 2. Higher-level astronomical primitives from `lsst.afw`, such as the `Image`, `Exposure`, and `Psf` classes. # 3. Our core algorithmic `Task` classes, including those for source detection, deblending, and measurement. # # This tutorial is based on Jim Bosch's globular cluster tutorial. However, in it's present state *scarlet* is unable to process crowded fields (most likely) due to poor initial conditions for the sources in these fields. So instead we use a less dense region of the image. # # ### Learning Objectives: # # After working through this tutorial you should be able to: # 1. Configure and run the LSST multiband deblender on a test list of objects; # 2. Understand its task context in the Data Release Production (DRP) pipeline. # # ### Logistics # This notebook is intended to be run on a LARGE instance at `lsst-lsp-stable.ncsa.illinois.edu` or `data.lsst.cloud` from a local git clone of the [StackClub](https://github.com/LSSTScienceCollaborations/StackClub) repo. # # # ### Set-up # In[ ]: # Site, host, and stack version get_ipython().system(' echo $EXTERNAL_INSTANCE_URL') get_ipython().system(' echo $HOSTNAME') get_ipython().system(' eups list -s | grep lsst_distrib') # ## Imports # # We'll start with some standard imports of both LSST and third-party packages. # In[ ]: import os import numpy as np # Familiar stack packages from lsst.geom import Box2I, Box2D, Point2I, Point2D, Extent2I, Extent2D from lsst.afw.image import Exposure, Image, PARENT # These may be less familiar objects dealing with multi-band data products from lsst.afw.image import MultibandExposure, MultibandImage from lsst.afw.detection import MultibandFootprint from lsst.afw.image import MultibandExposure # ## Data Set # # This tutorial can be run with two different data sets: # # 1. Coadded images made from Subaru Hyper Suprime-Cam (HSC) data in the COSMOS field. We've taken a LSST reprocessing of the HSC-SSP UltraDeep COSMOS field (see [this page](https://confluence.lsstcorp.org/display/DM/S18+HSC+PDR1+reprocessing) for information on that reprocessing, and [this page](https://hsc-release.mtk.nao.ac.jp/doc/) for the data), and added simulated stars from a scaled [SDSS catalog](http://www.sdss.org/dr14/data_access/value-added-catalogs/?vac_id=photometry-of-crowded-fields-in-sdss-for-galactic-globular-and-open-clusters). The result is a very deep image (deeper than the 10-year LSST Deep-Wide-Fast survey, though not as deep as LSST Deep Drilling fields will be) with both a large number of galaxies and region full of stars. # 2. Coadded images from the LSST DESC DC2 simulations. We've taken a fairly characeristic region of the Wide-Fast-Deep survey from LSST DESC DC2 and chosen a region around a galaxy cluster to see how the deblending performs. # # We'll be retrieving data using the `Butler` tool, which manages where various datasets are stored on the filesystem (and can in principle manage datasets that aren't even stored as files, though all of these are). You can find more information on the `Butler` in a set of tutorials [Basics/ButlerTutorial.ipynb](../Basics/ButlerTutorial.ipynb) and [Basics/Gen3ButlerTutorial.ipynb](../Basics/Gen3ButlerTutorial.ipynb). # # We start by creating a `Butler` instance, pointing it at a *Data Repository* (which here is just a root directory). Datasets managed by a butler are identified by a dictionary *Data ID* (specifying things like the visit number or sky patch) and a string *DatasetType* (such as a particular image or catalog). Different DatasetTypes have different keys, while different instances of the same Dataset Type have different values. All of the datasets we use in this tutorial will correspond to the same patch of sky, so they'll have at least the keys in the dictionary in the next cell (they will also have `filter`, but with different values). # # We can now use those to load a set of *grizy* coadds, which we'll put directly in a dictionary. The result of each `Butler.get` call is in this case an `lsst.afw.image.Exposure` object, an image that actually contains three "planes" (the main image, a bit mask, and a variance image) as well as many other objects that describe the image, such as its PSF and WCS. Note that we (confusingly) use `Exposures` to hold coadd images as well as true single-exposure images, but combine them into a `MultibandExposure`, which contains an exposure in each band. # # The DatasetType here (`deepCoadd_calexp` or `deepCoadd`) is a coadd image on which we've already done some additional processing, such as subtracting the background and setting some mask values, and the extra `filter` argument gets appended to the Data ID. # In[ ]: # Location of the DC2 Gen3 repository on this site URL = os.getenv('EXTERNAL_INSTANCE_URL') if URL.endswith('data.lsst.cloud'): # IDF repo = "s3://butler-us-central1-dp01" elif URL.endswith('ncsa.illinois.edu'): # NCSA repo = "/repo/dc2" else: raise Exception(f"Unrecognized URL: {URL}") collection='2.2i/runs/DP0.1' # Data set specification dataset_type = "deepCoadd" dataId = {"tract": 3828, "patch": 7*(4)+4} # Filters filters = 'grizy' filter_keys = filters # Specify the location of a cutout box x0,y0 = 18380, 17750 # Blended source index for deeper investigation idx = 2 # In[ ]: from lsst.daf.butler import Butler butler = Butler(repo,collections=collection) # Retrieve the data using the `butler` looping over the filters coadds = [butler.get(dataset_type, dataId=dataId, band=f) for f in filter_keys] coadds = MultibandExposure.fromExposures(filters, coadds) # ## Making and displaying color composite images # # We'll start by just looking at the images, as 3-color composites. We'll use astropy to build those as a nice way to demonstrate how to get NumPy arrays from the `MultibandImage` objects (the images in `coadds`). (LSST also has code to make 3-color composites using the same algorithm, and in fact the Astropy implementation is based on ours.) We'll use matplotlib to display the images themselves. # In[ ]: from astropy.visualization import make_lupton_rgb import matplotlib.pyplot as plt get_ipython().run_line_magic('matplotlib', 'inline') # We'll use the following function a few times to display color images. It's worth reading through the implementation carefully to see what's going on. # In[ ]: def showRGB(image, bgr="gri", ax=None, fp=None, figsize=(8,8), stretch=1, Q=10): """Display an RGB color composite image with matplotlib. Parameters ---------- image : `MultibandImage` `MultibandImage` to display. bgr : sequence A 3-element sequence of filter names (i.e. keys of the exps dict) indicating what band to use for each channel. If `image` only has three filters then this parameter is ignored and the filters in the image are used. ax : `matplotlib.axes.Axes` Axis in a `matplotlib.Figure` to display the image. If `axis` is `None` then a new figure is created. fp: `lsst.afw.detection.Footprint` Footprint that contains the peak catalog for peaks in the image. If `fp` is `None` then no peak positions are plotted. figsize: tuple Size of the `matplotlib.Figure` created. If `ax` is not `None` then this parameter is ignored. stretch: int The linear stretch of the image. Q: int The Asinh softening parameter. """ # If the image only has 3 bands, reverse the order of the bands to produce the RGB image if len(image) == 3: bgr = image.filters # Extract the primary image component of each Exposure with the .image property, and use .array to get a NumPy array view. rgb = make_lupton_rgb(image_r=image[bgr[2]].array, # numpy array for the r channel image_g=image[bgr[1]].array, # numpy array for the g channel image_b=image[bgr[0]].array, # numpy array for the b channel stretch=stretch, Q=Q) # parameters used to stretch and scale the pixel values if ax is None: fig = plt.figure(figsize=figsize) ax = fig.add_subplot(1,1,1) # Exposure.getBBox() returns a Box2I, a box with integer pixel coordinates that correspond to the centers of pixels. # Matplotlib's `extent` argument expects to receive the coordinates of the edges of pixels, which is what # this Box2D (a box with floating-point coordinates) represents. integerPixelBBox = image[bgr[0]].getBBox() bbox = Box2D(integerPixelBBox) ax.imshow(rgb, interpolation='nearest', origin='lower', extent=(bbox.getMinX(), bbox.getMaxX(), bbox.getMinY(), bbox.getMaxY())) if fp is not None: for peak in fp.getPeaks(): ax.plot(peak.getIx(), peak.getIy(), "bx", mew=2) # Notice that we can slice `MultibandImage` objects (as well a `MultibandExposure` objects) along the filter dimension using the filter names as indices. Like `Exposure` objects, `MultibandExposure` objects have `image`, `mask`, and `variance` properties that contain the image, mask plane, and variance of the `Exposure` respectively. For now we will only worry about the `image` property, although internal deblending and measurement algorithms make use of all three objects (when available). # In[ ]: showRGB(coadds[:"z"].image, figsize=(10, 10)) # In[ ]: showRGB(coadds["i":].image, figsize=(10, 10)) # Those images are a full "patch", which is our usual unit of processing for coadds - it's about the same size as a single LSST sensor. That's a bit unweildy (just because waiting for processing to happen isn't fun in a tutorial setting), so we'll reload our dict with sub-images centered on the region of interest. # # Note that we can load the sub-images directly with the `butler`, by appending `_sub` to the DatasetType and passing a `bbox` argument. Use `sampleBBox` to select a sub-region of the image (note that we add a small frame around each blend to include more background regions, which are important for the detection algorithm). # In[ ]: frame = 50 sampleBBox = Box2I(Point2I(18380-frame, 17750-frame), Extent2I(63+2*frame, 87+2*frame)) subset = coadds[:, sampleBBox] # Due to a bug in the code the PSF isn't copied properly. # The code below copies the PSF into the `MultibandExposure`, # but will be unecessary in the future for f in subset.filters: subset[f].setPsf(coadds[f].getPsf()) # In[ ]: showRGB(subset.image) # ## Basic Processing # # Now we'll try the regular LSST processing tasks, with a simpler configuration than we usually use to process coadds, just to avoid being distracted by complexity. This includes # # 1. Detection (`SourceDetectionTask`): given an `Exposure`, find above-threshold regions and peaks within them (`Footprints`), and create a *parent* source for each `Footprint`. # 2. Deblending (`ScarletDeblendTask`): given a `MultibandExposure` and a catalog of parent sources, create a *child* source for each peak in every `Footprint` that contains more than one peak. Each child source is given a `HeavyFootprint`, which contains both the pixel region that source covers and the fractional pixel values associated with that source. A `SourceDeblendTask` is also available using the single band SDSS-HSC deblender that takes a single band `Exposure`). # 3. Measurment (`SingleFrameMeasurementTask`): given an `Exposure` and a catalog of sources, run a set of "measurement plugins" on each source, using deblended pixel values if it is a child. Notice that measurement is still performed on single band catalogs, since none of the measurement algorithms work for multiband data. # # We'll start by importing these, along with the `SourceCatalog` class we'll use to hold the outputs. # In[ ]: from lsst.meas.algorithms import SourceDetectionTask from lsst.meas.extensions.scarlet import ScarletDeblendTask from lsst.meas.base import SingleFrameMeasurementTask from lsst.afw.table import SourceCatalog # We'll now construct all of these `Tasks` before actually running any of them. That's because `SourceDeblendTask` and `SingleFrameMeasurementTask` are constructed with a `Schema` object that records what fields they'll produce, and they modify that schema when they're constructed by adding columns to it. When we run the tasks later, they'll need to be given a catalog that includes all of those columns, **but we can't add columns to a catalog that already exists**. # # To recap, the sequence looks like this: # # 1. Make a (mostly) empty schema. # 2. Construct all of the `Task`s (in the order you plan to run them), which adds columns to the schema. # 3. Make a `SourceCatalog` object from the *complete* schema. # 4. Pass the same `SourceCatalog` object to each `Task` when you run it. # In[ ]: schema = SourceCatalog.Table.makeMinimalSchema() detectionTask = SourceDetectionTask(schema=schema) config = ScarletDeblendTask.ConfigClass() config.maxIter = 100 deblendTask = ScarletDeblendTask(schema=schema, config=config) # We'll customize the configuration of measurement to just run a few plugins. # The default list of plugins is much longer (and hence slower). measureConfig = SingleFrameMeasurementTask.ConfigClass() measureConfig.plugins.names = ["base_SdssCentroid", "base_PsfFlux", "base_SkyCoord"] # "Slots" are aliases that provide easy access to certain plugins. # Because we're not running the plugin these slots refer to by default, # we need to disable them in the configuration. measureConfig.slots.apFlux = None #measureConfig.slots.instFlux = None measureConfig.slots.shape = None measureConfig.slots.modelFlux = None measureConfig.slots.calibFlux = None measureConfig.slots.gaussianFlux = None measureTask = SingleFrameMeasurementTask(config=measureConfig, schema=schema) # The first step we'll run is detection, which actually returns a new `SourceCatalog` object rather than working on an existing one. # # Instead, it takes a `Table` object, which is sort of like a factory for records. We won't use it directly after this, and it isn't actually necessary to make a new `Table` every time you run `MultibandDetectionTask` (but you can only create one after you're done adding columns to the schema). # # `Task`s that return anything do so via a `lsst.pipe.base.Struct` object, which is just a simple collection of named attributes. The only return values we're interested is `sources`. That's our new `SourceCatalog`. # In[ ]: table = SourceCatalog.Table.make(schema) detectionResult = detectionTask.run(table, subset["r"]) catalog = detectionResult.sources # Let's take a quick look at what's in that catalog. First off, we can look at its schema: # In[ ]: catalog.schema # Note that this includes a lot of columns that were actually added by the deblend or measurement steps; those will all still be blank (`0` for integers or flags, `NaN` for floating-point columns). # # In fact, the only columns filled by `SourceDetectionTask` are the IDs. But it also attaches `Footprint` objects, which don't appear in the schema. You can retrieve the `Footprint` by calling `getFootprint()` on a row: # In[ ]: footprint = catalog[0].getFootprint() # `Footprints` have two components: # - a `SpanSet`, which represents an irregular region on an image via a list of (y, x0, x1) `Spans`; # - a `PeakCatalog`, a slightly different kind of catalog whose rows represent peaks within that `Footprint`. # # You can find a Stack Club notebook devoted to `Footprints` at [SourceDetection/Footprints.ipynb](../SourceDetection/Footprints.ipynb). # In[ ]: print(footprint.getSpans()) # In[ ]: print(footprint.getPeaks()) # If we actually look at the footprints in the catalog we see that some have only a single peak, while others have multiple peaks that need to be deblended. # # To display only the pixels contained in the footprint (and not other pixels in the bounding box) we create a `MultibandFootprint`, which is a `HeavyFootprint` that contains a `SpanSet`, `PeakCatalog`, and `flux` values for all of the pixels in the `SpanSet`. In this case the `flux` is the total measured flux in the image, since no deblending has taken place yet. # In[ ]: for i,src in enumerate(catalog): fp = src.getFootprint() img = coadds[:,fp.getBBox()].image mfp = MultibandFootprint.fromImages(coadds.filters, image=img, footprint=fp) showRGB(mfp.getImage().image, fp=fp, figsize=(3,3)) plt.annotate(str(i), (0.9,0.9), xycoords='axes fraction', color='w', weight='bold') # It's worth noting that while the peaks *can* have both an integer-valued position and a floating-point position, they're the same right now; `SourceDetectionTask` currently just finds the pixels that are local minima and doesn't try to find their sub-pixel locations. That's left to the centroider, which is part of the measurement stage. # # Before we can get to that point, we need to run the deblender: # In[ ]: help(deblendTask.run) # In[ ]: templateCatalog = deblendTask.run(coadds, catalog) # `ScarletDeblendTask` returns a `templateCatalog` that contains the model outputs including heavy footprints that are created. # # The deblender itself sets the `parent` column for each source, which is `0` for objects with no parent, and all of the columns that begin with `deblend_` and also adds new rows to the catalog for each child. It does *not* remove the parent rows it created those child rows from, and this is intentional, because we want to measure both "interpretations" of the blend family: one in which there is only one object (the parent version) and one in which there are several (the children). Before doing any science with the outputs of an LSST catalog, it's important to remove one of those interpretations (typically the parent one). That can be done by looking at the `deblend_nChild` and `parent` fields: # # 1. `parent` is the ID of the source from which this was deblended, or `0` if the source is itself a parent. # 2. `deblend_nChild` is the number of child sources this source has (so it's `0` for sources that are themselves children or were never blended). # # Together, these define two particularly useful filters: # # 1. `deblend_nChild == 0`: never-blended object or de-blended child # 2. `deblend_nChild == 0 and parent == 0`: never-blended object # # The first is what you'll usually want to use; the second is what to use if you're willing to throw away some objects (possibly many) because you don't trust the deblender. # # The last processing step for our purposes is running measurement, which must be done on each catalog, in each band (if we want measurements for all of them): # In[ ]: measureTask.run(templateCatalog["r"], coadds['r']) measureTask.run(templateCatalog["i"], coadds['i']) # Due to a feature in the deblender task, the resulting catalogs are not contiguous, and we need to copy them into new objects to use them appropriately. This step can be avoided in the near future. # In[ ]: import lsst.afw.table as afwTable for f in filters: _catalog = afwTable.SourceCatalog(templateCatalog[f].table.clone()) _catalog.extend(templateCatalog[f], deep=True) templateCatalog[f] = _catalog # Since we care about deblending (for the sake of this tutorial) lets look at the results from one of the blends displayed above (chosen by hand and specified by `idx`). We use the `HeavyFootprint`s from the catalog sources that have the same parent to build a model for the entire scene, and to compare the results of the flux conserved and *scarlet* models. In the process we look at both the *scarlet* and flux conserved models. # In[ ]: from lsst.afw.detection import MultibandFootprint from lsst.afw.image import MultibandImage # Note: this is not the parent ID, but the index of the source in the catalog # This index was chosen by hand, and was configured earlier in the notebook parentIdx = idx # Create empty multiband images to model the entire scene templateModel = MultibandImage.fromImages(coadds.filters, [Image(templateCatalog["r"][parentIdx].getFootprint().getBBox(), dtype=np.float32) for b in range(len(filters))]) # Only use the subset catalogs with the same parent parentId = templateCatalog["r"][parentIdx].get("id") templateChildren = {b: templateCatalog[b][templateCatalog[b].get("parent")==parentId] for b in filters} for n in range(len(templateChildren["r"])): # Add the source model to the model of the entire scene fp = MultibandFootprint(coadds.filters, [templateChildren[b][n].getFootprint() for b in filters]) templateModel[:, fp.getBBox()].array += fp.getImage(fill=0).image.array # Show the model showRGB(fp.getImage().image) plt.show() templateResidual = MultibandImage(coadds.filters, coadds[:, templateModel.getBBox()].image.array - templateModel.array) # Finally we look at the full models and the residuals. As expected, there are no residuals for the flux conserved model since all of the flux in the image (that is within the footprint) is added to one of the sources. In this particular case that works fine, but in instances where one or more sources were not detected this can cause one source to have its flux contaminated with its neighbor. # In[ ]: fig = plt.figure(figsize=(15,10)) ax = [fig.add_subplot(1,2,n+1) for n in range(2)] showRGB(coadds[:,fp.getBBox()].image, ax=ax[0]) showRGB(templateModel, ax=ax[1]) # # Exercises # # 1. Select some other`sampleBBox` regions and run through the code again, from source detection through measurment and blending displays. Don't foget to change the parent index to view only the children of the correct blend. # 2. Look at the configuration options on https://github.com/lsst/meas_extensions_scarlet/blob/master/python/lsst/meas/extensions/scarlet/deblend.py and modify some of them to see the changes in results. For example, set `config.symmetric=False` to turn off symmetry to see the effect of deblending without symmetry. # In[ ]: