Owner: Jim Chiang (@jchiang87)
Last Verified to Run: 2018-10-26 (by @yymao)
This notebook shows how to read in the coadd catalog data for an entire tract using the LSST Data Management (DM) data butler, and construct a Pandas dataframe with the columns needed to apply the weak lensing cuts used in recent analysis of data from Hyper Suprime Cam Mandelbaum et al (2018) using cuts given by Francois in Slack.
We have a notebook that performs a similar analysis using GCR. See also the more general script for extracting the coadd data from a tract into an hdf5 file.
After completing and studying this Notebook, you should be able to
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import lsst.daf.persistence as dp
from desc_dc2_dm_data import REPOS
Here's a function to do this: pass it a butler and some information about what you want, and get a pandas df object back. In this example, we'll get the forced photometry as well as the model measurements - these are stored in different tables.
def read_tract(butler, tract, filter_='i', num_patches=None):
"""
Read in the coadd catalog data from the specified tract, looping over patches and concatenating
into a pandas data frame.
Notes
-----
The `merged` coadd catalog is supplemented with the CModel fluxes from the `forced_src`
catalog for the desired filter.
Parameters
----------
butler: lsst.daf.persistence.Butler
Data butler that points to the data repository of interest, e.g., the output repository
for the Run1.1p data.
tract: lsst.skymap.tractInfo.TractInfo
The object containing the tract information as obtained from a skyMap object, e.g., via
`tract = skymap[tractId]` where `tractId` is the tract number.
filter_: str ['i']
Filter to use for extracting the `forced_src` fluxes.
num_patches: int [None]
Number of patches to consider. If None, then use all of the available patches in the tract.
Returns
-------
pandas.DataFrame: This will contain the merged coadd data with the band-specific columns.
"""
if num_patches is None:
num_patches = len(tract)
df_list = []
i = 0
print("Loop Patch Nobj")
print("=================")
for patch in tract:
patchId = '%d,%d' % patch.getIndex()
dataId = dict(filter=filter_, tract=tract.getId(), patch=patchId)
# Patches for which there is no data will raise a `NoResults` exception when requested
# from the butler. We catch those exceptions, note the missing data, and
# continue looping over the patches.
try:
forced = butler.get('deepCoadd_forced_src', dataId)
calib = butler.get('deepCoadd_calexp_calib', dataId)
calib.setThrowOnNegativeFlux(False)
merged = butler.get('deepCoadd_ref', dataId)
except dp.NoResults as eobj:
print(eobj)
continue
# Convert the merged coadd catalog into a pandas DataFrame and add the band-specific
# quantities from the forced_src coadd catalog as new columns.
data = merged.asAstropy().to_pandas()
data[filter_ + '_modelfit_CModel_instFlux'] = forced['modelfit_CModel_instFlux']
data[filter_ + '_modelfit_CModel_instFluxErr'] = forced['modelfit_CModel_instFluxErr']
# The calib object applies the zero point to get calibrated magnitudes.
data[filter_ + '_mag_CModel'] = calib.getMagnitude(forced['modelfit_CModel_instFlux'])
data[filter_ + '_mag_err_CModel'] = calib.getMagnitude(forced['modelfit_CModel_instFluxErr'])
data[filter_ + '_modelfit_CModel_SNR'] \
= forced['modelfit_CModel_instFlux']/forced['modelfit_CModel_instFluxErr']
data['ext_shapeHSM_HsmShapeRegauss_abs_e'] \
= np.hypot(data['ext_shapeHSM_HsmShapeRegauss_e1'],
data['ext_shapeHSM_HsmShapeRegauss_e2'])
df_list.append(data)
# Print the current patch and number of objects added to the final DataFrame.
print(" {} {} {}".format(i, patchId, len(data)))
i += 1
if i == num_patches:
break
return pd.concat(df_list)
In order to access the DM Science Pipeline Processed outputs, including images and catalog data, we summon a data butler and provide the path to the data repository where the DM Science Pipeline tasks have written their outputs.
butler = dp.Butler(REPOS['1.2i'])
The skyMap
object stores the information defining how we decided to divide up the sky into tracts and patches when we did the coaddition. Here we pick a tract that is near the center of the protoDC2 simulation region.
skymap = butler.get('deepCoadd_skyMap')
tractId = 4851
filter_ = 'i'
num_patches = None
data = read_tract(butler, skymap[tractId], filter_=filter_, num_patches=num_patches)
print("Number of objects in the coadd catalog:", len(data))
Loop Patch Nobj ================= 0 0,0 3198 1 1,0 5391 2 2,0 5661 3 3,0 5705 4 4,0 4982 5 5,0 5883 6 6,0 4743 7 0,1 3987 8 1,1 5722 9 2,1 5616 10 3,1 5098 11 4,1 5289 12 5,1 4735 13 6,1 5216 14 0,2 3149 15 1,2 5754 16 2,2 5236 17 3,2 6042 18 4,2 5138 19 5,2 5751 20 6,2 4748 21 0,3 2910 22 1,3 6241 23 2,3 6176 24 3,3 5832 25 4,3 5301 26 5,3 5484 27 6,3 4675 28 0,4 3777 29 1,4 6581 30 2,4 6376 31 3,4 5951 32 4,4 6239 33 5,4 6254 34 6,4 5508 35 0,5 3855 36 1,5 6316 37 2,5 6953 38 3,5 6004 39 4,5 5902 40 5,5 6279 41 6,5 5505 42 0,6 2745 43 1,6 4891 44 2,6 6016 45 3,6 5851 46 4,6 6383 47 5,6 6938 48 6,6 5361
/opt/lsst/software/stack/python/miniconda3-4.5.4/envs/lsst-scipipe-fcd27eb/lib/python3.6/site-packages/ipykernel/__main__.py:72: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'.
Number of objects in the coadd catalog: 263348
First off, we need to apply a baseline set of cuts to remove nan's.
base_mask = ~(np.isnan(data['i_modelfit_CModel_instFlux'])
| np.isnan(data['ext_shapeHSM_HsmShapeRegauss_resolution'])
| np.isnan(data['ext_shapeHSM_HsmShapeRegauss_e1']))
data = data[base_mask]
print("Number of objects after baseline cuts:", len(data))
Number of objects after baseline cuts: 137296
Now we get to the weak lensing cuts. The detect_isPrimary
flag identifies the primary object in the overlap regions between patches, i.e., applying it resolves duplicates from the different analyses of overlapping patches.
mask = data['detect_isPrimary']
mask &= data['deblend_skipped'] == False
mask &= data['base_PixelFlags_flag_edge'] == False
mask &= data['base_PixelFlags_flag_interpolatedCenter'] == False
mask &= data['base_PixelFlags_flag_saturatedCenter'] == False
mask &= data['base_PixelFlags_flag_crCenter'] == False
mask &= data['base_PixelFlags_flag_bad'] == False
mask &= data['base_PixelFlags_flag_suspectCenter'] == False
mask &= data['base_PixelFlags_flag_clipped'] == False
mask &= data['ext_shapeHSM_HsmShapeRegauss_flag'] == False
# Cut on measured object properties
mask &= data['i_modelfit_CModel_SNR'] >= 10
mask &= data['ext_shapeHSM_HsmShapeRegauss_resolution'] >= 0.3
mask &= data['ext_shapeHSM_HsmShapeRegauss_abs_e'] < 2
mask &= data['ext_shapeHSM_HsmShapeRegauss_sigma'] <= 0.4
mask &= data['i_mag_CModel'] < 24.5 # !!! Doesnt have exinction correction
mask &= data['base_Blendedness_abs_instFlux'] < 10**(-0.375)
Let's make some plots of galaxy properties with the above cuts applied. The figure to compare with is Fig. 22 of Mandelbaum et al (2018).
fig = plt.figure(figsize=(10, 10))
hist_kwds = dict(bins=100, histtype='step', normed=True, linewidth=2.0, color="black")
p1 = fig.add_subplot(2, 2, 1)
plt.hist(data['ext_shapeHSM_HsmShapeRegauss_resolution'][mask], **hist_kwds);
plt.xlabel('{}-band HSM resolution'.format(filter_))
plt.xlim([0.2,1])
p2 = fig.add_subplot(2, 2, 2)
plt.hist(data[filter_ + '_modelfit_CModel_SNR'][mask], range=(0, 100), **hist_kwds);
plt.xlabel('{}-band CModel S/N'.format(filter_))
plt.xlim([5,100])
p3 = fig.add_subplot(2, 2, 3)
plt.hist(data[filter_ + '_mag_CModel'][mask], **hist_kwds);
plt.xlabel(' {}-band CModel mag'.format(filter_))
plt.xlim([20,25])
p4 = fig.add_subplot(2, 2, 4)
plt.hist(data['ext_shapeHSM_HsmShapeRegauss_abs_e'][mask], **hist_kwds);
plt.xlabel('HSM distortion magnitude |e|')
plt.xlim([0,2]);
/opt/lsst/software/stack/python/miniconda3-4.5.4/envs/lsst-scipipe-fcd27eb/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg. warnings.warn("The 'normed' kwarg is deprecated, and has been "
Here's the published figure, for comparison: