DC2 Object Catalog Run2.2i GCR tutorial -- Part II: Lensing Cuts

Owners: Francois Lanusse @EiffL, Javier Sanchez @fjaviersanchez
Last Verified to Run: 2020-07-02 (by @jrbogart)

This notebook is the second part of the tutorial (Part I here). Here, we present more advanced features of the DPDD-like object catalog, which contains the detected objects at the coadd level, through the Generic Catalog Reader GCR and build a lensing sample which we compare to the published HSC Y1 sample. The final goal is to show how the different filtering mechanisms, features of GCR, and the samples presented here can be useful for your science.

Learning objectives:

After going through this notebook, you should be able to:

  1. Select a clean sample of galaxies for weak lensing.
  2. Compute a depth map for a given SNR threshold.
  3. Load both the DC2 Run2.2i catalog and a selection from the HSC Public Data Release XMM field.
  4. Directly access the coadd images to investigate suspicious objects.

Logistics: This notebook is intended to be run through the JupyterHub NERSC interface available here: https://jupyter-dev.nersc.gov. To setup your NERSC environment, please follow the instructions available here: https://confluence.slac.stanford.edu/display/LSSTDESC/Using+Jupyter-dev+at+NERSC

Other notes: This notebook uses the LSST DM stack (see on the top right corner that we are using the desc-stack kernel)

Selecting a useful sample of galaxies: lensing cuts from HSC DR1

In this notebook, we will build step by step a sample of galaxies from the DC2 run2.2i object catalog, and compare it to an equivalent sample built from the HSC DR1 catalog. See more info on Aihara et al. 2018

Sample selection

We will start from a set of basic sanity cuts that will select extended objects and reject problematic sources, including those for which shape measurement has failed.

One subtlety is that shape measurement is only run for the reference band, which is most of the time the i-band, but not always, we will further restrict the sample to objects for which we have i-band shapes using the merge_measurement_i flag.

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
In [2]:
import GCRCatalogs
from GCR import GCRQuery
# Load the object catalog
run = 'run2.2i_dr3'
tract_s = '3081'
catalog = GCRCatalogs.load_catalog(f'dc2_object_{run}')
In [3]:
basic_cuts = [
    GCRQuery('extendedness > 0'),     # Extended objects
    GCRQuery((np.isfinite, 'mag_i')), # Select objects that have i-band magnitudes
    GCRQuery('clean'), # The source has no flagged pixels (interpolated, saturated, edge, clipped...) 
                       # and was not skipped by the deblender
    GCRQuery('xy_flag == 0'),                                      # Flag for bad centroid measurement
    GCRQuery('ext_shapeHSM_HsmShapeRegauss_flag == 0'),            # Error code returned by shape measurement code
    GCRQuery((np.isfinite, 'ext_shapeHSM_HsmShapeRegauss_sigma')), # Shape measurement uncertainty should not be NaN
]

In addition to these basic cuts, we will want to apply a set of cuts based on object properties, to ensure we are selecting well resolved and well measured galaxies. One of these properties is the measured total distortion, which is not directly defined in the schema, but can be derived from the measured $e1$, $e2$ distortion components according to $|e| = \sqrt{e_1^2 + e_2^2 }$

The GCR provides a convenience function, add_quantity_modifier, to add this quantity to the schema on the fly, so that we can use it afterwards to build our cuts:

In [4]:
# Adds the new derived column 
catalog.add_quantity_modifier('shape_hsm_regauss_etot', 
                              (np.hypot, 'ext_shapeHSM_HsmShapeRegauss_e1', 'ext_shapeHSM_HsmShapeRegauss_e2'), 
                              overwrite=True)
In [5]:
# Define lensing cuts on galaxy properties 
properties_cuts = [
    GCRQuery('snr_i_cModel > 10'),                              # SNR > 10
    GCRQuery('mag_i_cModel < 24.5'),                            # cModel imag brighter than 24.5
    GCRQuery('ext_shapeHSM_HsmShapeRegauss_resolution >= 0.3'), # Sufficiently resolved galaxies compared to PSF
    GCRQuery('shape_hsm_regauss_etot < 2'),                     # Total distortion in reasonable range
    GCRQuery('ext_shapeHSM_HsmShapeRegauss_sigma <= 0.4'),      # Shape measurement errors reasonable
]

# We can now extract our lensing sample 
quantities = ['mag_i_cModel', 'snr_i_cModel', 'shape_hsm_regauss_etot', 'ext_shapeHSM_HsmShapeRegauss_resolution']
data_basic = catalog.get_quantities(quantities, 
                                    filters=basic_cuts+properties_cuts, 
                                    native_filters=[f'tract == {tract_s}'])
/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/GCRCatalogs/dc2_dm_catalog.py:43: RuntimeWarning: invalid value encountered in log10
  return -2.5 * np.log10(flux) + AB_mag_zp_wrt_nanoJansky
/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/GCRCatalogs/dc2_dm_catalog.py:43: RuntimeWarning: divide by zero encountered in log10
  return -2.5 * np.log10(flux) + AB_mag_zp_wrt_nanoJansky
In [6]:
plt.figure(figsize=(10,10))
plt.subplot(221)
plt.hist(data_basic['ext_shapeHSM_HsmShapeRegauss_resolution'], 100, range=[0,1], density=True);
plt.xlabel('i-band resolution')
plt.xlim()
plt.subplot(222)
plt.hist(data_basic['snr_i_cModel'], 100, range=[0,100], density=True)
plt.xlabel('i-band cmodel S/N')
plt.subplot(223)
plt.hist(data_basic['mag_i_cModel'], 100, range=[20,25], density=True);
plt.xlabel('i-band cmodel mag')
plt.subplot(224)
plt.hist(data_basic['shape_hsm_regauss_etot'],100, density=True);
plt.xlabel('Ellipticity magniture |e|');

You can compare this plot to Fig 12. in Mandelbaum et al. 2018:

A quick visual comparison will highlight two things:

  • We are missing a lot of galaxies between about 24.2 and 24.5 mag
  • We have a bump near resolution of 1

We are going to investigate to understand these differences below.

Looking at the depth of the survey

One obvious reason why we would be missing some faint galaxies is if DC2 run2.2i is shallower than HSC. We will test this here by measuring the depths of both Run2.2i and the HSC DR1 XMM field. More generally, it is also a concern for most science analysis to have spatially uniform sampled data, which can be checked by looking at the depth of the sample.

There are several ways to do this, in this case, we are going to check what's the magnitude which has a median SNR closest to 10, the SNR cut of our lensing sample.

Code to do this calculation for the HSC DR1 XMM field is commented out below because that catalog is not currently available, but we know the answer from previous runs. That is printed out for comparison with the calculated number for DC2 run2.2i

In [7]:
# Import an efficient alternative to binned_statistic_2d, defined in utils/cic.py
from utils.cic import binned_statistic
    
import healpy as hp
In [8]:
def depth_map_snr (ra, dec, mags, snr,snr_threshold=10,nside=2048):
    """
    Constructs a depth map on a healpix grid for a given SNR threshold.
    
    Parameters
    ----------
    ra, dec: Array of coordinates on the sky (in deg.)
    mags, snr : measured magnitude and snr for the sample
    snr_threshold: SNR
    """
    # Remove potentially problematic entries
    good = np.logical_or(np.logical_not(np.isnan(ra)),np.logical_not(np.isnan(dec)))
    # Create array of healpix pixel indices corresponding to coordinates 
    pix_nums = hp.ang2pix(nside,np.pi/2.-dec[good]*np.pi/180,ra[good]*np.pi/180)
    
    # Create output map
    map_out = np.zeros(12*nside**2)
    
    # Bins in magnitudes
    bin_centers = np.linspace(22+6/30.,28-6/30.,30.)
    
    # For each healpix pixel
    for px in np.unique(pix_nums):
        # Select all objects within this pixel
        mask = px==pix_nums
        if np.count_nonzero(mask)>0:
            # Compute median snr in bins of magnitude
            median_snr = binned_statistic(mags[mask],snr[mask],np.nanmedian,nbins=30,range=(22,28))
            mask2 = np.isnan(median_snr)==False
            # Find magnitude corresponding to snr threshold
            if np.count_nonzero(mask2)>0:
                depth = bin_centers[mask2][np.argmin(np.fabs(median_snr[mask2] - snr_threshold))]
                map_out[px]=depth
            else:
                map_out[px]=0
        else:
            map_out[px]=0.
    return map_out
In [9]:
quantities = ['ra', 'dec', 'mag_i_cModel', 'snr_i_cModel', 'ext_shapeHSM_HsmShapeRegauss_resolution']

# Data from DC2 run2.2i
data_dc2 = catalog.get_quantities(quantities, 
                                  filters=basic_cuts, # Note the only apply the basic_cuts
                                  native_filters=[f'tract == {tract_s}'])

# Data from HSC DR1 XMM field 
# cat_hsc = GCRCatalogs.load_catalog('hsc-pdr1-xmm')
# data_hsc = cat_hsc.get_quantities(quantities,
#                                  filters=basic_cuts)
/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/GCRCatalogs/dc2_dm_catalog.py:43: RuntimeWarning: invalid value encountered in log10
  return -2.5 * np.log10(flux) + AB_mag_zp_wrt_nanoJansky
/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/GCRCatalogs/dc2_dm_catalog.py:43: RuntimeWarning: divide by zero encountered in log10
  return -2.5 * np.log10(flux) + AB_mag_zp_wrt_nanoJansky

Note: Runtime warning will arise due to objects with incorrect fluxes but the results are unaffected by them.

In [10]:
# Compute i-band depth maps for Run2.2i, print median
m10map_dc2 = depth_map_snr(data_dc2['ra'], data_dc2['dec'], data_dc2['mag_i_cModel'], data_dc2['snr_i_cModel'])
print(f"{run} median i-band 10-sigma depth ", np.median(m10map_dc2[m10map_dc2 > 0.0]))

# Print the corresponding value for the HSC data, computed previously
print("HSC XMM median i-band 10-sigma depth ", 25.2896551722413795)
/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/ipykernel/__main__.py:20: DeprecationWarning: object of type <class 'float'> cannot be safely interpreted as an integer.
/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1076: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis, out=out, keepdims=keepdims)
/global/u2/y/yymao/.beavis/DC2-analysis/tutorials/utils/cic.py:11: RuntimeWarning: invalid value encountered in greater
  mask = (x > r0) &  (x < r1)
/global/u2/y/yymao/.beavis/DC2-analysis/tutorials/utils/cic.py:11: RuntimeWarning: invalid value encountered in less
  mask = (x > r0) &  (x < r1)
run2.2i_dr3 median i-band 10-sigma depth  24.710344827586205
HSC XMM median i-band 10-sigma depth  25.28965517224138
In [11]:
import pickle
import gzip
def save_map(outfile, depth_map):
    f = gzip.open(outfile, 'wb')
    pickle.dump(depth_map, f)
    f.close()
In [12]:
# Uncomment the following to save the depth map.  It's used in the next tutorial
# There is no need to save unless you've changed run or tract
# map_dir = 'your_choice_goes_here'
# save_map(f'{map_dir}/m10map_{run}_{tract_s}.pklz', m10map_dc2)

And now we see the difference between the two samples, HSC is deeper. (In the case of Run 1.1p, this is partly due to the fact that it was interrupted mid-run so it doesn't reach the full depth of DC2. For Run2.2i_dr3, about 500 visits in the center of the field were omitted. If we had picked a tract in this deficit area the plots and the depth measurement would look considerably worse.)

Impact of blendedness

The second discrepancy between our sample and the Mandelbaum et al. plot is an excess of galaxies appearing very well resolved compared to the PSF (resolution > 0.9). To understand this difference, we are going to select a few of these objects and extract postage stamps from the DM stack for visual inspection.

We begin by selecting galaxies in our lensing sample which are near perfectly resolved by adding a cut on resolution:

In [13]:
sample_cut = basic_cuts + properties_cuts + ['ext_shapeHSM_HsmShapeRegauss_resolution >= 0.98']

data = catalog.get_quantities(['ra', 'dec', 'mag_i_cModel', 'ext_shapeHSM_HsmShapeRegauss_resolution'], 
                              filters=sample_cut,
                              native_filters=['tract == 3081'])

Now we will extract a few postage stamps at these coordinates, to do so we will reuse some of the code from the DC2 Postage Stamps tutorial. Please have a look at that tutorial to fully understand the function we will be using here, but in a nustshell we are going to query the DM data Butler to retrieve cutouts of the Deep Coadd exposures of these objects in the i-band.

In [14]:
from desc_dc2_dm_data import REPOS

import lsst.daf.persistence as dafPersist
import lsst.geom as afwGeom
import lsst.afw.coord as afwCoord
import lsst.afw.image as afwImage
import lsst.afw.display as afwDisplay

from astropy.table import Table
from astropy.visualization import ZScaleInterval

# Please check the DC2 Postage Stamps tutorial for all the details of how this works
def cutout_coadd_ra_dec(butler, ra, dec, filt='i', datasetType='deepCoadd', 
                        skymap=None, cutoutSideLength=50, **kwargs):
    """Produce a cutout from a coadd at the given ra,dec coordinates
    

    Parameters
    --
    butler - lsst.daf.persistence.Butler of the data repository
    ra, dec - coordinates of the center of the cutout (in degrees).
    filter - Filter of the image to load
    datasetType - 'deepCoadd'  Which type of coadd to load.  Doesn't support 'calexp'
    
    skymap - [optional] Pass in to avoid the Butler read.  Useful if you have lots of them.
    cutoutSideLength - [optional] Side of the cutout region in pixels.
    
    Returns
    --
    MaskedImage
    """
    # Create a lsst.afw.geom.SpherePoint coordinates object
    radec = afwGeom.SpherePoint(ra, dec, afwGeom.degrees)
    cutoutSize = afwGeom.ExtentI(cutoutSideLength, cutoutSideLength)

    if skymap is None:
        skymap = butler.get("deepCoadd_skyMap")
    
    # Retrieves the tract, patch info for these coordinates from the skymap
    tractInfo = skymap.findTract(radec)
    patchInfo = tractInfo.findPatch(radec)
    
    # Get pixel coordinates on the tract
    xy = afwGeom.PointI(tractInfo.getWcs().skyToPixel(radec))
    bbox = afwGeom.BoxI(xy - cutoutSize//2, cutoutSize)

    coaddId = {'tract': tractInfo.getId(), 'patch': "%d,%d" % patchInfo.getIndex(), 'filter': filt}
    
    cutout_image = butler.get(datasetType+'_sub', bbox=bbox, immediate=True, dataId=coaddId)
    
    return cutout_image

With this tool to extract cutouts in hand, let's have a look at a few examples in our sample:

In [15]:
# Create an instance of the data butler for our run data repository
repo = REPOS[f'{run}'[3:]]
butler = dafPersist.Butler(repo)
In [16]:
fig, ax = plt.subplots(4, 4, figsize=(15,15))

for ra, dec, ax_this in zip(data['ra'], data['dec'], ax.flat):
    # Extract the cutout using the data butler
    cutout_image = cutout_coadd_ra_dec(butler, ra, dec, filt='i')
    
    # Plot the postage stamp on the same scales, with some arcsinh range compression 
    ax_this.imshow(np.arcsinh(cutout_image.image.array), vmax=4, cmap='binary');
    
    # Let's add a crosshair to guide the eye
    ax_this.axhline(25, color='k', alpha=0.5);
    ax_this.axvline(25, color='k', alpha=0.5);
So, one thing may jump to eye, these objects are supposed to be large, extremely well resolved, yet they often look fairly small... but they have a lot of sometimes very bright neighbours. This is very suspicious and may indicate a problem with the DM deblender.

Looking at the SCHEMA.md one may notice a field named blendedness. This is a metric produced by the DM stack defined for deblended objects and "measures the fraction of the total flux in the neighborhood of a source that belongs to its neighbors" (Bosch et. 2018).

Let's have a look at the blendedness values of the examples in our sample:

In [17]:
data.update(catalog.get_quantities(['blendedness'], 
                                   filters=sample_cut,
                                   native_filters=['tract == 3081']))

data['blendedness'][:16]
/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/GCRCatalogs/dc2_dm_catalog.py:43: RuntimeWarning: invalid value encountered in log10
  return -2.5 * np.log10(flux) + AB_mag_zp_wrt_nanoJansky
/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/GCRCatalogs/dc2_dm_catalog.py:43: RuntimeWarning: divide by zero encountered in log10
  return -2.5 * np.log10(flux) + AB_mag_zp_wrt_nanoJansky
Out[17]:
array([0.77196875, 0.67613115, 0.92295569, 0.89298037, 0.92297811,
       0.92413947, 0.99069868, 0.74097552, 0.88860161, 0.8866249 ,
       0.98923703, 0.99385025, 0.86873228, 0.84240061, 0.94371736,
       0.32133813])

As one would expect, they mostly have large values, close to 1, indicating that these objects belong to very blended neighborhoods. Because the deblender seems to be failing for these objects, we can use this metric to try to exclude them from our sample.

As a matter of fact, this is what was done in the (Mandelbaum et al., 2018) paper where an additional cut on blendedness was introduced to guard against deblender failures.

Let's try to rebuild our sample following the same approach:

In [18]:
properties_cuts = [
    GCRQuery('snr_i_cModel > 10'),                              # SNR > 10
    GCRQuery('mag_i_cModel < 24.5'),                            # cModel imag brighter than 24.5
    GCRQuery('ext_shapeHSM_HsmShapeRegauss_resolution >= 0.3'), # Sufficiently resolved galaxies compared to PSF
    GCRQuery('shape_hsm_regauss_etot < 2'),                     # Total distortion in reasonable range
    GCRQuery('ext_shapeHSM_HsmShapeRegauss_sigma <= 0.4'),      # Shape measurement errors reasonable
    # New cut on blendedness:
    GCRQuery('blendedness < 10**(-0.375)')                      # Avoid spurious detections and those contaminated by blends
]

quantities = ['mag_i_cModel', 'snr_i_cModel', 'shape_hsm_regauss_etot', 'ext_shapeHSM_HsmShapeRegauss_resolution']
data = catalog.get_quantities(quantities, 
                              filters=basic_cuts + properties_cuts, 
                              native_filters=['tract == 3081'])
In [19]:
plt.figure(figsize=(10,10))
plt.subplot(221)
plt.hist(data['ext_shapeHSM_HsmShapeRegauss_resolution'], 100, range=[0,1], density=True, histtype='step', label='New cuts');
plt.xlabel('i-band resolution')
plt.xlim()
plt.subplot(222)
plt.hist(data['snr_i_cModel'], 100, range=[0,100], density=True, histtype='step', label='New cuts')
plt.xlabel('i-band cmodel S/N')
plt.subplot(223)
plt.hist(data['mag_i_cModel'], 100, range=[20,25], density=True, histtype='step', label='New cuts');
plt.xlabel('i-band cmodel mag')
plt.subplot(224)
plt.hist(data['shape_hsm_regauss_etot'],100, density=True, histtype='step', label='New cuts');
plt.xlabel('Ellipticity magniture |e|');
plt.subplot(221)
plt.hist(data_basic['ext_shapeHSM_HsmShapeRegauss_resolution'], 100, range=[0,1], density=True, histtype='step', label='Original cuts');
plt.xlabel('i-band resolution')
plt.legend(loc='best');
plt.xlim()
plt.subplot(222)
plt.hist(data_basic['snr_i_cModel'], 100, range=[0,100], density=True, histtype='step', label='Original cuts')
plt.xlabel('i-band cmodel S/N')
plt.subplot(223)
plt.hist(data_basic['mag_i_cModel'], 100, range=[20,25], density=True, histtype='step', label='Original cuts');
plt.xlabel('i-band cmodel mag')
plt.subplot(224)
plt.hist(data_basic['shape_hsm_regauss_etot'],100, density=True, histtype='step', label='Original cuts');
plt.xlabel('Ellipticity magniture |e|');
/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/matplotlib/figure.py:98: MatplotlibDeprecationWarning: 
Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
  "Adding an axes using the same arguments as a previous axes "

\o/ The excess of high resolution objects has disappeared, mystery solved!

If you want, you can check out Part III: Challenges!