#!/usr/bin/env python
# coding: utf-8

# # Single Exposure Processing
# 
# This is intended to walk you through the processing pipeline on jupyterlab. It builds on the first two hands-on tutorials in the LSST ["Getting started" tutorial series](https://pipelines.lsst.io/getting-started/index.html#getting-started-tutorial). It is intended for anyone getting started with using the LSST Science Pipelines for data processing. 
# 
# The goal of this tutorial is to setup a Butler for a simulated LSST data set and to run the `processCCD.py` pipeline task to produced reduced images.

# ## Setting up the data repository
# 
# Sample data for this tutorial comes from the `twinkles` LSST simulation and is available in a shared directory on `jupyterlab`. We will make a copy of the input data in our current directory:

# In[ ]:


get_ipython().system('if [ ! -d DATA ]; then cp -r /project/shared/data/Twinkles_subset/input_data_v2 DATA; fi')


# Inside the data directory you'll see a directory structure that looks like this

# In[ ]:


get_ipython().system('ls -lh DATA/')


# The Butler uses a mapper to find and organize data in a format specific to each camera. Here we're using `lsst.obs.lsstSim.LsstSimMapper` mapper for the Twinkles simulated data:

# In[ ]:


cat DATA/_mapper


# All of the relavent images and calibrations have already been ingested into the Butler for this data set.

# ## Reviewing what data will be processed
# 
# We'll now process individual raw LSST simulated images in the Butler `DATA` repository into calibrated exposures. We’ll use the `processCcd.py` command-line task to remove instrumental signatures with dark, bias and flat field calibration images. `processCcd.py` will also use the reference catalog to establish a preliminary WCS and photometric zeropoint solution.
# 
# First we'll examine the set of exposures available in the Twinkles data set using the Butler

# Now we'll do a similar thing using the `processEimageTask` from the LSST pipeline. **There is a bit of ugliness here because the `processEimage.py` command line script is only python2 compatible so we need to parse the arguments through the API. This has the nasty habit of trying to exit after the args.**

# In[ ]:


from lsst.obs.lsstSim.processEimage import ProcessEimageTask


# In[ ]:


args = 'DATA --rerun process-eimage  --id filter=r --show data'
ProcessEimageTask.parseAndRun(args=args.split())

# BUG: the command above exits early, due to a namespace problem:
# /opt/lsst/software/stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/pipe_base/15.0/python/lsst/pipe/base/argumentParser.py in parse_args(self, config, args, log, override)
#     628 
#     629         if namespace.show and "run" not in namespace.show:
# --> 630             sys.exit(0)


# The important arguments here are `--id` and `--show data`.
# 
# The `--id` argument allows you to select datasets to process by their data IDs. Data IDs describe individual datasets in the Butler repository. Datasets also have types, and each command-line task will only process data of certain types. In this case, `processEimage.py` will processes raw simulated e-images **(need more description of e-images)**.
# 
# In the above command, the `--id filter=r` argument selects data from the r filter. Specifying `--id` without any arguments acts as a wildcard that selects all raw-type data in the repository.
# 
# The `--show data` argument puts `processEimage.py` into a dry-run mode that prints a list of data IDs to standard output that would be processed according to the `--id` argument rather than actually processing the data. 
# 
# Notice the keys that describe each data ID, such as the visit (exposure identifier), raft (identifies a specific LSST camera raft), sensor (identifies an individual ccd on a raft) and filter, among others. With these keys you can select exactly what data you want to process.

# Next we perform the same task directly with the Butler:

# In[ ]:


import lsst.daf.persistence as dafPersist


# In[ ]:


butler = dafPersist.Butler(inputs='DATA')
butler.queryMetadata('eimage', ['visit', 'raft', 'sensor','filter'], dataId={'filter': 'r'})


# ## Processing data
# 
# Now we'll move on to actually process some of the Twinkles data. To do this, we'll remove the `--show data` argument.

# In[ ]:


args = 'DATA --rerun process-eimage  --id filter=r --show data'
# The command below also exits early - see the error message above.
ProcessEimageTask.parseAndRun(args=args.split())