# DC2 Object Catalog Run2.2i GCR tutorial -- Part II: Lensing Cuts¶

Owners: Francois Lanusse @EiffL, Javier Sanchez @fjaviersanchez
Last Verified to Run: 2020-07-02 (by @jrbogart)

This notebook is the second part of the tutorial (Part I here). Here, we present more advanced features of the DPDD-like object catalog, which contains the detected objects at the coadd level, through the Generic Catalog Reader GCR and build a lensing sample which we compare to the published HSC Y1 sample. The final goal is to show how the different filtering mechanisms, features of GCR, and the samples presented here can be useful for your science.

Learning objectives:

After going through this notebook, you should be able to:

1. Select a clean sample of galaxies for weak lensing.
2. Compute a depth map for a given SNR threshold.
3. Load both the DC2 Run2.2i catalog and a selection from the HSC Public Data Release XMM field.
4. Directly access the coadd images to investigate suspicious objects.

Logistics: This notebook is intended to be run through the JupyterHub NERSC interface available here: https://jupyter-dev.nersc.gov. To setup your NERSC environment, please follow the instructions available here: https://confluence.slac.stanford.edu/display/LSSTDESC/Using+Jupyter-dev+at+NERSC

Other notes: This notebook uses the LSST DM stack (see on the top right corner that we are using the desc-stack kernel)

## Selecting a useful sample of galaxies: lensing cuts from HSC DR1¶

In this notebook, we will build step by step a sample of galaxies from the DC2 run2.2i object catalog, and compare it to an equivalent sample built from the HSC DR1 catalog. See more info on Aihara et al. 2018

### Sample selection¶

We will start from a set of basic sanity cuts that will select extended objects and reject problematic sources, including those for which shape measurement has failed.

One subtlety is that shape measurement is only run for the reference band, which is most of the time the i-band, but not always, we will further restrict the sample to objects for which we have i-band shapes using the merge_measurement_i flag.

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

In [2]:
import GCRCatalogs
from GCR import GCRQuery
run = 'run2.2i_dr3'
tract_s = '3081'

In [3]:
basic_cuts = [
GCRQuery('extendedness > 0'),     # Extended objects
GCRQuery((np.isfinite, 'mag_i')), # Select objects that have i-band magnitudes
GCRQuery('clean'), # The source has no flagged pixels (interpolated, saturated, edge, clipped...)
# and was not skipped by the deblender
GCRQuery('xy_flag == 0'),                                      # Flag for bad centroid measurement
GCRQuery('ext_shapeHSM_HsmShapeRegauss_flag == 0'),            # Error code returned by shape measurement code
GCRQuery((np.isfinite, 'ext_shapeHSM_HsmShapeRegauss_sigma')), # Shape measurement uncertainty should not be NaN
]


In addition to these basic cuts, we will want to apply a set of cuts based on object properties, to ensure we are selecting well resolved and well measured galaxies. One of these properties is the measured total distortion, which is not directly defined in the schema, but can be derived from the measured $e1$, $e2$ distortion components according to $|e| = \sqrt{e_1^2 + e_2^2 }$

The GCR provides a convenience function, add_quantity_modifier, to add this quantity to the schema on the fly, so that we can use it afterwards to build our cuts:

In [4]:
# Adds the new derived column
(np.hypot, 'ext_shapeHSM_HsmShapeRegauss_e1', 'ext_shapeHSM_HsmShapeRegauss_e2'),
overwrite=True)

In [5]:
# Define lensing cuts on galaxy properties
properties_cuts = [
GCRQuery('snr_i_cModel > 10'),                              # SNR > 10
GCRQuery('mag_i_cModel < 24.5'),                            # cModel imag brighter than 24.5
GCRQuery('ext_shapeHSM_HsmShapeRegauss_resolution >= 0.3'), # Sufficiently resolved galaxies compared to PSF
GCRQuery('shape_hsm_regauss_etot < 2'),                     # Total distortion in reasonable range
GCRQuery('ext_shapeHSM_HsmShapeRegauss_sigma <= 0.4'),      # Shape measurement errors reasonable
]

# We can now extract our lensing sample
quantities = ['mag_i_cModel', 'snr_i_cModel', 'shape_hsm_regauss_etot', 'ext_shapeHSM_HsmShapeRegauss_resolution']
data_basic = catalog.get_quantities(quantities,
filters=basic_cuts+properties_cuts,
native_filters=[f'tract == {tract_s}'])

/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/GCRCatalogs/dc2_dm_catalog.py:43: RuntimeWarning: invalid value encountered in log10
return -2.5 * np.log10(flux) + AB_mag_zp_wrt_nanoJansky
/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/GCRCatalogs/dc2_dm_catalog.py:43: RuntimeWarning: divide by zero encountered in log10
return -2.5 * np.log10(flux) + AB_mag_zp_wrt_nanoJansky

In [6]:
plt.figure(figsize=(10,10))
plt.subplot(221)
plt.hist(data_basic['ext_shapeHSM_HsmShapeRegauss_resolution'], 100, range=[0,1], density=True);
plt.xlabel('i-band resolution')
plt.xlim()
plt.subplot(222)
plt.hist(data_basic['snr_i_cModel'], 100, range=[0,100], density=True)
plt.xlabel('i-band cmodel S/N')
plt.subplot(223)
plt.hist(data_basic['mag_i_cModel'], 100, range=[20,25], density=True);
plt.xlabel('i-band cmodel mag')
plt.subplot(224)
plt.hist(data_basic['shape_hsm_regauss_etot'],100, density=True);
plt.xlabel('Ellipticity magniture |e|');


You can compare this plot to Fig 12. in Mandelbaum et al. 2018:

A quick visual comparison will highlight two things:

• We are missing a lot of galaxies between about 24.2 and 24.5 mag
• We have a bump near resolution of 1

We are going to investigate to understand these differences below.

### Looking at the depth of the survey¶

One obvious reason why we would be missing some faint galaxies is if DC2 run2.2i is shallower than HSC. We will test this here by measuring the depths of both Run2.2i and the HSC DR1 XMM field. More generally, it is also a concern for most science analysis to have spatially uniform sampled data, which can be checked by looking at the depth of the sample.

There are several ways to do this, in this case, we are going to check what's the magnitude which has a median SNR closest to 10, the SNR cut of our lensing sample.

Code to do this calculation for the HSC DR1 XMM field is commented out below because that catalog is not currently available, but we know the answer from previous runs. That is printed out for comparison with the calculated number for DC2 run2.2i

In [7]:
# Import an efficient alternative to binned_statistic_2d, defined in utils/cic.py
from utils.cic import binned_statistic

import healpy as hp

In [8]:
def depth_map_snr (ra, dec, mags, snr,snr_threshold=10,nside=2048):
"""
Constructs a depth map on a healpix grid for a given SNR threshold.

Parameters
----------
ra, dec: Array of coordinates on the sky (in deg.)
mags, snr : measured magnitude and snr for the sample
snr_threshold: SNR
"""
# Remove potentially problematic entries
good = np.logical_or(np.logical_not(np.isnan(ra)),np.logical_not(np.isnan(dec)))
# Create array of healpix pixel indices corresponding to coordinates
pix_nums = hp.ang2pix(nside,np.pi/2.-dec[good]*np.pi/180,ra[good]*np.pi/180)

# Create output map
map_out = np.zeros(12*nside**2)

# Bins in magnitudes
bin_centers = np.linspace(22+6/30.,28-6/30.,30.)

# For each healpix pixel
for px in np.unique(pix_nums):
# Select all objects within this pixel
# Compute median snr in bins of magnitude
# Find magnitude corresponding to snr threshold
map_out[px]=depth
else:
map_out[px]=0
else:
map_out[px]=0.
return map_out

In [9]:
quantities = ['ra', 'dec', 'mag_i_cModel', 'snr_i_cModel', 'ext_shapeHSM_HsmShapeRegauss_resolution']

# Data from DC2 run2.2i
data_dc2 = catalog.get_quantities(quantities,
filters=basic_cuts, # Note the only apply the basic_cuts
native_filters=[f'tract == {tract_s}'])

# Data from HSC DR1 XMM field
# data_hsc = cat_hsc.get_quantities(quantities,
#                                  filters=basic_cuts)

/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/GCRCatalogs/dc2_dm_catalog.py:43: RuntimeWarning: invalid value encountered in log10
return -2.5 * np.log10(flux) + AB_mag_zp_wrt_nanoJansky
/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/GCRCatalogs/dc2_dm_catalog.py:43: RuntimeWarning: divide by zero encountered in log10
return -2.5 * np.log10(flux) + AB_mag_zp_wrt_nanoJansky


Note: Runtime warning will arise due to objects with incorrect fluxes but the results are unaffected by them.

In [10]:
# Compute i-band depth maps for Run2.2i, print median
m10map_dc2 = depth_map_snr(data_dc2['ra'], data_dc2['dec'], data_dc2['mag_i_cModel'], data_dc2['snr_i_cModel'])
print(f"{run} median i-band 10-sigma depth ", np.median(m10map_dc2[m10map_dc2 > 0.0]))

# Print the corresponding value for the HSC data, computed previously
print("HSC XMM median i-band 10-sigma depth ", 25.2896551722413795)

/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/ipykernel/__main__.py:20: DeprecationWarning: object of type <class 'float'> cannot be safely interpreted as an integer.
/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1076: RuntimeWarning: Mean of empty slice
return np.nanmean(a, axis, out=out, keepdims=keepdims)
/global/u2/y/yymao/.beavis/DC2-analysis/tutorials/utils/cic.py:11: RuntimeWarning: invalid value encountered in greater
mask = (x > r0) &  (x < r1)
/global/u2/y/yymao/.beavis/DC2-analysis/tutorials/utils/cic.py:11: RuntimeWarning: invalid value encountered in less
mask = (x > r0) &  (x < r1)

run2.2i_dr3 median i-band 10-sigma depth  24.710344827586205
HSC XMM median i-band 10-sigma depth  25.28965517224138

In [11]:
import pickle
import gzip
def save_map(outfile, depth_map):
f = gzip.open(outfile, 'wb')
pickle.dump(depth_map, f)
f.close()

In [12]:
# Uncomment the following to save the depth map.  It's used in the next tutorial
# There is no need to save unless you've changed run or tract
# map_dir = 'your_choice_goes_here'
# save_map(f'{map_dir}/m10map_{run}_{tract_s}.pklz', m10map_dc2)


And now we see the difference between the two samples, HSC is deeper. (In the case of Run 1.1p, this is partly due to the fact that it was interrupted mid-run so it doesn't reach the full depth of DC2. For Run2.2i_dr3, about 500 visits in the center of the field were omitted. If we had picked a tract in this deficit area the plots and the depth measurement would look considerably worse.)

### Impact of blendedness¶

The second discrepancy between our sample and the Mandelbaum et al. plot is an excess of galaxies appearing very well resolved compared to the PSF (resolution > 0.9). To understand this difference, we are going to select a few of these objects and extract postage stamps from the DM stack for visual inspection.

We begin by selecting galaxies in our lensing sample which are near perfectly resolved by adding a cut on resolution:

In [13]:
sample_cut = basic_cuts + properties_cuts + ['ext_shapeHSM_HsmShapeRegauss_resolution >= 0.98']

data = catalog.get_quantities(['ra', 'dec', 'mag_i_cModel', 'ext_shapeHSM_HsmShapeRegauss_resolution'],
filters=sample_cut,
native_filters=['tract == 3081'])


Now we will extract a few postage stamps at these coordinates, to do so we will reuse some of the code from the DC2 Postage Stamps tutorial. Please have a look at that tutorial to fully understand the function we will be using here, but in a nustshell we are going to query the DM data Butler to retrieve cutouts of the Deep Coadd exposures of these objects in the i-band.

In [14]:
from desc_dc2_dm_data import REPOS

import lsst.daf.persistence as dafPersist
import lsst.geom as afwGeom
import lsst.afw.coord as afwCoord
import lsst.afw.image as afwImage
import lsst.afw.display as afwDisplay

from astropy.table import Table
from astropy.visualization import ZScaleInterval

# Please check the DC2 Postage Stamps tutorial for all the details of how this works
skymap=None, cutoutSideLength=50, **kwargs):
"""Produce a cutout from a coadd at the given ra,dec coordinates

Parameters
--
butler - lsst.daf.persistence.Butler of the data repository
ra, dec - coordinates of the center of the cutout (in degrees).
filter - Filter of the image to load

skymap - [optional] Pass in to avoid the Butler read.  Useful if you have lots of them.
cutoutSideLength - [optional] Side of the cutout region in pixels.

Returns
--
"""
# Create a lsst.afw.geom.SpherePoint coordinates object
cutoutSize = afwGeom.ExtentI(cutoutSideLength, cutoutSideLength)

if skymap is None:

# Retrieves the tract, patch info for these coordinates from the skymap

# Get pixel coordinates on the tract
bbox = afwGeom.BoxI(xy - cutoutSize//2, cutoutSize)

coaddId = {'tract': tractInfo.getId(), 'patch': "%d,%d" % patchInfo.getIndex(), 'filter': filt}

cutout_image = butler.get(datasetType+'_sub', bbox=bbox, immediate=True, dataId=coaddId)

return cutout_image


With this tool to extract cutouts in hand, let's have a look at a few examples in our sample:

In [15]:
# Create an instance of the data butler for our run data repository
repo = REPOS[f'{run}'[3:]]
butler = dafPersist.Butler(repo)

In [16]:
fig, ax = plt.subplots(4, 4, figsize=(15,15))

for ra, dec, ax_this in zip(data['ra'], data['dec'], ax.flat):
# Extract the cutout using the data butler
cutout_image = cutout_coadd_ra_dec(butler, ra, dec, filt='i')

# Plot the postage stamp on the same scales, with some arcsinh range compression
ax_this.imshow(np.arcsinh(cutout_image.image.array), vmax=4, cmap='binary');

# Let's add a crosshair to guide the eye
ax_this.axhline(25, color='k', alpha=0.5);
ax_this.axvline(25, color='k', alpha=0.5);

##### So, one thing may jump to eye, these objects are supposed to be large, extremely well resolved, yet they often look fairly small... but they have a lot of sometimes very bright neighbours. This is very suspicious and may indicate a problem with the DM deblender.¶

Looking at the SCHEMA.md one may notice a field named blendedness. This is a metric produced by the DM stack defined for deblended objects and "measures the fraction of the total flux in the neighborhood of a source that belongs to its neighbors" (Bosch et. 2018).

Let's have a look at the blendedness values of the examples in our sample:

In [17]:
data.update(catalog.get_quantities(['blendedness'],
filters=sample_cut,
native_filters=['tract == 3081']))

data['blendedness'][:16]

/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/GCRCatalogs/dc2_dm_catalog.py:43: RuntimeWarning: invalid value encountered in log10
return -2.5 * np.log10(flux) + AB_mag_zp_wrt_nanoJansky
/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/GCRCatalogs/dc2_dm_catalog.py:43: RuntimeWarning: divide by zero encountered in log10
return -2.5 * np.log10(flux) + AB_mag_zp_wrt_nanoJansky

Out[17]:
array([0.77196875, 0.67613115, 0.92295569, 0.89298037, 0.92297811,
0.92413947, 0.99069868, 0.74097552, 0.88860161, 0.8866249 ,
0.98923703, 0.99385025, 0.86873228, 0.84240061, 0.94371736,
0.32133813])

As one would expect, they mostly have large values, close to 1, indicating that these objects belong to very blended neighborhoods. Because the deblender seems to be failing for these objects, we can use this metric to try to exclude them from our sample.

As a matter of fact, this is what was done in the (Mandelbaum et al., 2018) paper where an additional cut on blendedness was introduced to guard against deblender failures.

Let's try to rebuild our sample following the same approach:

In [18]:
properties_cuts = [
GCRQuery('snr_i_cModel > 10'),                              # SNR > 10
GCRQuery('mag_i_cModel < 24.5'),                            # cModel imag brighter than 24.5
GCRQuery('ext_shapeHSM_HsmShapeRegauss_resolution >= 0.3'), # Sufficiently resolved galaxies compared to PSF
GCRQuery('shape_hsm_regauss_etot < 2'),                     # Total distortion in reasonable range
GCRQuery('ext_shapeHSM_HsmShapeRegauss_sigma <= 0.4'),      # Shape measurement errors reasonable
# New cut on blendedness:
GCRQuery('blendedness < 10**(-0.375)')                      # Avoid spurious detections and those contaminated by blends
]

quantities = ['mag_i_cModel', 'snr_i_cModel', 'shape_hsm_regauss_etot', 'ext_shapeHSM_HsmShapeRegauss_resolution']
data = catalog.get_quantities(quantities,
filters=basic_cuts + properties_cuts,
native_filters=['tract == 3081'])

In [19]:
plt.figure(figsize=(10,10))
plt.subplot(221)
plt.hist(data['ext_shapeHSM_HsmShapeRegauss_resolution'], 100, range=[0,1], density=True, histtype='step', label='New cuts');
plt.xlabel('i-band resolution')
plt.xlim()
plt.subplot(222)
plt.hist(data['snr_i_cModel'], 100, range=[0,100], density=True, histtype='step', label='New cuts')
plt.xlabel('i-band cmodel S/N')
plt.subplot(223)
plt.hist(data['mag_i_cModel'], 100, range=[20,25], density=True, histtype='step', label='New cuts');
plt.xlabel('i-band cmodel mag')
plt.subplot(224)
plt.hist(data['shape_hsm_regauss_etot'],100, density=True, histtype='step', label='New cuts');
plt.xlabel('Ellipticity magniture |e|');
plt.subplot(221)
plt.hist(data_basic['ext_shapeHSM_HsmShapeRegauss_resolution'], 100, range=[0,1], density=True, histtype='step', label='Original cuts');
plt.xlabel('i-band resolution')
plt.legend(loc='best');
plt.xlim()
plt.subplot(222)
plt.hist(data_basic['snr_i_cModel'], 100, range=[0,100], density=True, histtype='step', label='Original cuts')
plt.xlabel('i-band cmodel S/N')
plt.subplot(223)
plt.hist(data_basic['mag_i_cModel'], 100, range=[20,25], density=True, histtype='step', label='Original cuts');
plt.xlabel('i-band cmodel mag')
plt.subplot(224)
plt.hist(data_basic['shape_hsm_regauss_etot'],100, density=True, histtype='step', label='Original cuts');
plt.xlabel('Ellipticity magniture |e|');

/opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/lib/python3.7/site-packages/matplotlib/figure.py:98: MatplotlibDeprecationWarning:
Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
"Adding an axes using the same arguments as a previous axes "


\o/ The excess of high resolution objects has disappeared, mystery solved!

If you want, you can check out Part III: Challenges!