Author(s): Phil Marshall (@drphilmarshall)
Maintainer(s): Alex Drlica-Wagner (@kadrlica)
Level: Introductory
Last Verified to Run: 2021-03-31
Verified Stack Release: 21.0.0
In this notebook we will look at a few different ways to find the documentation on a given DM Stack function or class. After working through this tutorial you should be able to:
where_is
Stack Club utility to locate DM Stack web documentation.This notebook is intended to be runnable on lsst-lsp-stable.ncsa.illinois.edu
from a local git clone of https://github.com/LSSTScienceCollaborations/StackClub.
# What version of the Stack are we using?
! echo $HOSTNAME
! eups list -s | grep lsst_distrib
nb-kadrlica-r21-0-0 lsst_distrib 21.0.0+973e4c9e85 current v21_0_0 setup
We'll need the stackclub
package to be installed. If you are not developing this package, and you have permission to write to your base python site-packages, you can install it using pip
, like this:
pip install git+git://github.com/LSSTScienceCollaborations/StackClub.git#egg=stackclub
If you are developing the stackclub
package (eg by adding modules to it to support the Stack Club tutorial that you are writing), you'll need to make a local, editable installation, like this:
! cd .. && python setup.py -q develop --user && cd -
/home/kadrlica/notebooks/.beavis/StackClub/GettingStarted
When editing the stackclub
package files, we want the latest version to be imported when we re-run the import command. To enable this, we need the %autoreload magic command.
%load_ext autoreload
%autoreload 2
Command line tasks have usage information - try running them with no arguments, or --help
.
! imageDifference.py --help
usage: imageDifference.py input [options] positional arguments: input path to input data repository, relative to $PIPE_INPUT_ROOT optional arguments: -h, --help show this help message and exit --calib RAWCALIB path to input calibration repository, relative to $PIPE_CALIB_ROOT --output RAWOUTPUT path to output data repository (need not exist), relative to $PIPE_OUTPUT_ROOT --rerun [INPUT:]OUTPUT rerun name: sets OUTPUT to ROOT/rerun/OUTPUT; optionally sets ROOT to ROOT/rerun/INPUT -c [NAME=VALUE [NAME=VALUE ...]], --config [NAME=VALUE [NAME=VALUE ...]] config override(s), e.g. -c foo=newfoo bar.baz=3 -C [CONFIGFILE [CONFIGFILE ...]], --configfile [CONFIGFILE [CONFIGFILE ...]] config override file(s) -L [LEVEL|COMPONENT=LEVEL [LEVEL|COMPONENT=LEVEL ...]], --loglevel [LEVEL|COMPONENT=LEVEL [LEVEL|COMPONENT=LEVEL ...]] logging level; supported levels are [trace|debug|info|warn|error|fatal] --longlog use a more verbose format for the logging --debug enable debugging output? --doraise raise an exception on error (else log a message and continue)? --noExit Do not exit even upon failure (i.e. return a struct to the calling script) --profile PROFILE Dump cProfile statistics to filename --show SHOW [SHOW ...] display the specified information to stdout and quit (unless run is specified); information is (config[=PATTERN]|history=PATTERN|tasks|data|run) -j PROCESSES, --processes PROCESSES Number of processes to use -t TIMEOUT, --timeout TIMEOUT Timeout for multiprocessing; maximum wall time (sec) --clobber-output remove and re-create the output directory if it already exists (safe with -j, but not all other forms of parallel execution) --clobber-config backup and then overwrite existing config files instead of checking them (safe with -j, but not all other forms of parallel execution) --no-backup-config Don't copy config to file~N backup. --clobber-versions backup and then overwrite existing package versions instead of checkingthem (safe with -j, but not all other forms of parallel execution) --no-versions don't check package versions; useful for development --id [KEY=VALUE1[^VALUE2[^VALUE3...] [KEY=VALUE1[^VALUE2[^VALUE3...] ...]] data ID, e.g. --id visit=12345 ccd=1,2 --templateId [KEY=VALUE1[^VALUE2[^VALUE3...] [KEY=VALUE1[^VALUE2[^VALUE3...] ...]] Template data ID in case of calexp template, e.g. --templateId visit=6789 Notes: * --config, --configfile, --id, --loglevel and @file may appear multiple times; all values are used, in order left to right * @file reads command-line options from the specified file: * data may be distributed among multiple lines (e.g. one option per line) * data after # is treated as a comment and ignored * blank lines and lines starting with # are ignored * To specify multiple values for an option, do not use = after the option name: * right: --configfile foo bar * wrong: --configfile=foo bar
The pipeline task python code also contains useful docstrings, accessible in various ways:
from lsst.pipe.tasks.imageDifference import ImageDifferenceTask
help(ImageDifferenceTask)
Help on class ImageDifferenceTask in module lsst.pipe.tasks.imageDifference: class ImageDifferenceTask(lsst.pipe.base.cmdLineTask.CmdLineTask, lsst.pipe.base.pipelineTask.PipelineTask) | ImageDifferenceTask(butler=None, **kwargs) | | Subtract an image from a template and measure the result | | Method resolution order: | ImageDifferenceTask | lsst.pipe.base.cmdLineTask.CmdLineTask | lsst.pipe.base.pipelineTask.PipelineTask | lsst.pipe.base.task.Task | builtins.object | | Methods defined here: | | __init__(self, butler=None, **kwargs) | !Construct an ImageDifference Task | | @param[in] butler Butler object to use in constructing reference object loaders | | fitAstrometry(self, templateSources, templateExposure, selectSources) | Fit the relative astrometry between templateSources and selectSources | | Todo | ---- | | Remove this method. It originally fit a new WCS to the template before calling register.run | because our TAN-SIP fitter behaved badly for points far from CRPIX, but that's been fixed. | It remains because a subtask overrides it. | | getSchemaCatalogs(self) | Return a dict of empty catalogs for each catalog dataset produced by this task. | | run(self, exposure=None, selectSources=None, templateExposure=None, templateSources=None, idFactory=None, calexpBackgroundExposure=None, subtractedExposure=None) | PSF matches, subtract two images and perform detection on the difference image. | | Parameters | ---------- | exposure : `lsst.afw.image.ExposureF`, optional | The science exposure, the minuend in the image subtraction. | Can be None only if ``config.doSubtract==False``. | selectSources : `lsst.afw.table.SourceCatalog`, optional | Identified sources on the science exposure. This catalog is used to | select sources in order to perform the AL PSF matching on stamp images | around them. The selection steps depend on config options and whether | ``templateSources`` and ``matchingSources`` specified. | templateExposure : `lsst.afw.image.ExposureF`, optional | The template to be subtracted from ``exposure`` in the image subtraction. | The template exposure should cover the same sky area as the science exposure. | It is either a stich of patches of a coadd skymap image or a calexp | of the same pointing as the science exposure. Can be None only | if ``config.doSubtract==False`` and ``subtractedExposure`` is not None. | templateSources : `lsst.afw.table.SourceCatalog`, optional | Identified sources on the template exposure. | idFactory : `lsst.afw.table.IdFactory` | Generator object to assign ids to detected sources in the difference image. | calexpBackgroundExposure : `lsst.afw.image.ExposureF`, optional | Background exposure to be added back to the science exposure | if ``config.doAddCalexpBackground==True`` | subtractedExposure : `lsst.afw.image.ExposureF`, optional | If ``config.doSubtract==False`` and ``config.doDetection==True``, | performs the post subtraction source detection only on this exposure. | Otherwise should be None. | | Returns | ------- | results : `lsst.pipe.base.Struct` | ``subtractedExposure`` : `lsst.afw.image.ExposureF` | Difference image. | ``matchedExposure`` : `lsst.afw.image.ExposureF` | The matched PSF exposure. | ``subtractRes`` : `lsst.pipe.base.Struct` | The returned result structure of the ImagePsfMatchTask subtask. | ``diaSources`` : `lsst.afw.table.SourceCatalog` | The catalog of detected sources. | ``selectSources`` : `lsst.afw.table.SourceCatalog` | The input source catalog with optionally added Qa information. | | Notes | ----- | The following major steps are included: | | - warp template coadd to match WCS of image | - PSF match image to warped template | - subtract image from PSF-matched, warped template | - detect sources | - measure sources | | For details about the image subtraction configuration modes | see `lsst.ip.diffim`. | | runDataRef(self, sensorRef, templateIdList=None) | Subtract an image from a template coadd and measure the result. | | Data I/O wrapper around `run` using the butler in Gen2. | | Parameters | ---------- | sensorRef : `lsst.daf.persistence.ButlerDataRef` | Sensor-level butler data reference, used for the following data products: | | Input only: | - calexp | - psf | - ccdExposureId | - ccdExposureId_bits | - self.config.coaddName + "Coadd_skyMap" | - self.config.coaddName + "Coadd" | Input or output, depending on config: | - self.config.coaddName + "Diff_subtractedExp" | Output, depending on config: | - self.config.coaddName + "Diff_matchedExp" | - self.config.coaddName + "Diff_src" | | Returns | ------- | results : `lsst.pipe.base.Struct` | Returns the Struct by `run`. | | runDebug(self, exposure, subtractRes, selectSources, kernelSources, diaSources) | Make debug plots and displays. | | Todo | ---- | Test and update for current debug display and slot names | | runQuantum(self, butlerQC: lsst.pipe.base.butlerQuantumContext.ButlerQuantumContext, inputRefs: lsst.pipe.base.connections.InputQuantizedConnection, outputRefs: lsst.pipe.base.connections.OutputQuantizedConnection) | Method to do butler IO and or transforms to provide in memory | objects for tasks run method | | Parameters | ---------- | butlerQC : `ButlerQuantumContext` | A butler which is specialized to operate in the context of a | `lsst.daf.butler.Quantum`. | inputRefs : `InputQuantizedConnection` | Datastructure whose attribute names are the names that identify | connections defined in corresponding `PipelineTaskConnections` | class. The values of these attributes are the | `lsst.daf.butler.DatasetRef` objects associated with the defined | input/prerequisite connections. | outputRefs : `OutputQuantizedConnection` | Datastructure whose attribute names are the names that identify | connections defined in corresponding `PipelineTaskConnections` | class. The values of these attributes are the | `lsst.daf.butler.DatasetRef` objects associated with the defined | output connections. | | ---------------------------------------------------------------------- | Static methods defined here: | | makeIdFactory(expId, expBits) | Create IdFactory instance for unique 64 bit diaSource id-s. | | Parameters | ---------- | expId : `int` | Exposure id. | | expBits: `int` | Number of used bits in ``expId``. | | Note | ---- | The diasource id-s consists of the ``expId`` stored fixed in the highest value | ``expBits`` of the 64-bit integer plus (bitwise or) a generated sequence number in the | low value end of the integer. | | Returns | ------- | idFactory: `lsst.afw.table.IdFactory` | | ---------------------------------------------------------------------- | Data and other attributes defined here: | | ConfigClass = <class 'lsst.pipe.tasks.imageDifference.ImageDifferenceC... | Config for ImageDifferenceTask | | RunnerClass = <class 'lsst.pipe.tasks.imageDifference.ImageDifferenceT... | A `TaskRunner` for `CmdLineTask`\ s that require a ``butler`` keyword | argument to be passed to their constructor. | | ---------------------------------------------------------------------- | Methods inherited from lsst.pipe.base.cmdLineTask.CmdLineTask: | | writeConfig(self, butler, clobber=False, doBackup=True) | Write the configuration used for processing the data, or check that | an existing one is equal to the new one if present. | | Parameters | ---------- | butler : `lsst.daf.persistence.Butler` | Data butler used to write the config. The config is written to | dataset type `CmdLineTask._getConfigName`. | clobber : `bool`, optional | A boolean flag that controls what happens if a config already has | been saved: | | - `True`: overwrite or rename the existing config, depending on | ``doBackup``. | - `False`: raise `TaskError` if this config does not match the | existing config. | doBackup : `bool`, optional | Set to `True` to backup the config files if clobbering. | | writeMetadata(self, dataRef) | Write the metadata produced from processing the data. | | Parameters | ---------- | dataRef | Butler data reference used to write the metadata. | The metadata is written to dataset type | `CmdLineTask._getMetadataName`. | | writePackageVersions(self, butler, clobber=False, doBackup=True, dataset='packages') | Compare and write package versions. | | Parameters | ---------- | butler : `lsst.daf.persistence.Butler` | Data butler used to read/write the package versions. | clobber : `bool`, optional | A boolean flag that controls what happens if versions already have | been saved: | | - `True`: overwrite or rename the existing version info, depending | on ``doBackup``. | - `False`: raise `TaskError` if this version info does not match | the existing. | doBackup : `bool`, optional | If `True` and clobbering, old package version files are backed up. | dataset : `str`, optional | Name of dataset to read/write. | | Raises | ------ | TaskError | Raised if there is a version mismatch with current and persisted | lists of package versions. | | Notes | ----- | Note that this operation is subject to a race condition. | | writeSchemas(self, butler, clobber=False, doBackup=True) | Write the schemas returned by | `lsst.pipe.base.Task.getAllSchemaCatalogs`. | | Parameters | ---------- | butler : `lsst.daf.persistence.Butler` | Data butler used to write the schema. Each schema is written to the | dataset type specified as the key in the dict returned by | `~lsst.pipe.base.Task.getAllSchemaCatalogs`. | clobber : `bool`, optional | A boolean flag that controls what happens if a schema already has | been saved: | | - `True`: overwrite or rename the existing schema, depending on | ``doBackup``. | - `False`: raise `TaskError` if this schema does not match the | existing schema. | doBackup : `bool`, optional | Set to `True` to backup the schema files if clobbering. | | Notes | ----- | If ``clobber`` is `False` and an existing schema does not match a | current schema, then some schemas may have been saved successfully | and others may not, and there is no easy way to tell which is which. | | ---------------------------------------------------------------------- | Class methods inherited from lsst.pipe.base.cmdLineTask.CmdLineTask: | | applyOverrides(config) from builtins.type | A hook to allow a task to change the values of its config *after* | the camera-specific overrides are loaded but before any command-line | overrides are applied. | | Parameters | ---------- | config : instance of task's ``ConfigClass`` | Task configuration. | | Notes | ----- | This is necessary in some cases because the camera-specific overrides | may retarget subtasks, wiping out changes made in | ConfigClass.setDefaults. See LSST Trac ticket #2282 for more | discussion. | | .. warning:: | | This is called by CmdLineTask.parseAndRun; other ways of | constructing a config will not apply these overrides. | | parseAndRun(args=None, config=None, log=None, doReturnResults=False) from builtins.type | Parse an argument list and run the command. | | Parameters | ---------- | args : `list`, optional | List of command-line arguments; if `None` use `sys.argv`. | config : `lsst.pex.config.Config`-type, optional | Config for task. If `None` use `Task.ConfigClass`. | log : `lsst.log.Log`-type, optional | Log. If `None` use the default log. | doReturnResults : `bool`, optional | If `True`, return the results of this task. Default is `False`. | This is only intended for unit tests and similar use. It can | easily exhaust memory (if the task returns enough data and you | call it enough times) and it will fail when using multiprocessing | if the returned data cannot be pickled. | | Returns | ------- | struct : `lsst.pipe.base.Struct` | Fields are: | | ``argumentParser`` | the argument parser (`lsst.pipe.base.ArgumentParser`). | ``parsedCmd`` | the parsed command returned by the argument parser's | `~lsst.pipe.base.ArgumentParser.parse_args` method | (`argparse.Namespace`). | ``taskRunner`` | the task runner used to run the task (an instance of | `Task.RunnerClass`). | ``resultList`` | results returned by the task runner's ``run`` method, one entry | per invocation (`list`). This will typically be a list of | `Struct`, each containing at least an ``exitStatus`` integer | (0 or 1); see `Task.RunnerClass` (`TaskRunner` by default) for | more details. | | Notes | ----- | Calling this method with no arguments specified is the standard way to | run a command-line task from the command-line. For an example see | ``pipe_tasks`` ``bin/makeSkyMap.py`` or almost any other file in that | directory. | | If one or more of the dataIds fails then this routine will exit (with | a status giving the number of failed dataIds) rather than returning | this struct; this behaviour can be overridden by specifying the | ``--noExit`` command-line option. | | ---------------------------------------------------------------------- | Data and other attributes inherited from lsst.pipe.base.cmdLineTask.CmdLineTask: | | canMultiprocess = True | | ---------------------------------------------------------------------- | Methods inherited from lsst.pipe.base.pipelineTask.PipelineTask: | | getResourceConfig(self) | Return resource configuration for this task. | | Returns | ------- | Object of type `~config.ResourceConfig` or ``None`` if resource | configuration is not defined for this task. | | ---------------------------------------------------------------------- | Methods inherited from lsst.pipe.base.task.Task: | | __reduce__(self) | Pickler. | | emptyMetadata(self) | Empty (clear) the metadata for this Task and all sub-Tasks. | | getAllSchemaCatalogs(self) | Get schema catalogs for all tasks in the hierarchy, combining the | results into a single dict. | | Returns | ------- | schemacatalogs : `dict` | Keys are butler dataset type, values are a empty catalog (an | instance of the appropriate `lsst.afw.table` Catalog type) for all | tasks in the hierarchy, from the top-level task down | through all subtasks. | | Notes | ----- | This method may be called on any task in the hierarchy; it will return | the same answer, regardless. | | The default implementation should always suffice. If your subtask uses | schemas the override `Task.getSchemaCatalogs`, not this method. | | getFullMetadata(self) | Get metadata for all tasks. | | Returns | ------- | metadata : `lsst.daf.base.PropertySet` | The `~lsst.daf.base.PropertySet` keys are the full task name. | Values are metadata for the top-level task and all subtasks, | sub-subtasks, etc. | | Notes | ----- | The returned metadata includes timing information (if | ``@timer.timeMethod`` is used) and any metadata set by the task. The | name of each item consists of the full task name with ``.`` replaced | by ``:``, followed by ``.`` and the name of the item, e.g.:: | | topLevelTaskName:subtaskName:subsubtaskName.itemName | | using ``:`` in the full task name disambiguates the rare situation | that a task has a subtask and a metadata item with the same name. | | getFullName(self) | Get the task name as a hierarchical name including parent task | names. | | Returns | ------- | fullName : `str` | The full name consists of the name of the parent task and each | subtask separated by periods. For example: | | - The full name of top-level task "top" is simply "top". | - The full name of subtask "sub" of top-level task "top" is | "top.sub". | - The full name of subtask "sub2" of subtask "sub" of top-level | task "top" is "top.sub.sub2". | | getName(self) | Get the name of the task. | | Returns | ------- | taskName : `str` | Name of the task. | | See also | -------- | getFullName | | getTaskDict(self) | Get a dictionary of all tasks as a shallow copy. | | Returns | ------- | taskDict : `dict` | Dictionary containing full task name: task object for the top-level | task and all subtasks, sub-subtasks, etc. | | makeSubtask(self, name, **keyArgs) | Create a subtask as a new instance as the ``name`` attribute of this | task. | | Parameters | ---------- | name : `str` | Brief name of the subtask. | keyArgs | Extra keyword arguments used to construct the task. The following | arguments are automatically provided and cannot be overridden: | | - "config". | - "parentTask". | | Notes | ----- | The subtask must be defined by ``Task.config.name``, an instance of | `~lsst.pex.config.ConfigurableField` or | `~lsst.pex.config.RegistryField`. | | timer(self, name, logLevel=10000) | Context manager to log performance data for an arbitrary block of | code. | | Parameters | ---------- | name : `str` | Name of code being timed; data will be logged using item name: | ``Start`` and ``End``. | logLevel | A `lsst.log` level constant. | | Examples | -------- | Creating a timer context: | | .. code-block:: python | | with self.timer("someCodeToTime"): | pass # code to time | | See also | -------- | timer.logInfo | | ---------------------------------------------------------------------- | Class methods inherited from lsst.pipe.base.task.Task: | | makeField(doc) from builtins.type | Make a `lsst.pex.config.ConfigurableField` for this task. | | Parameters | ---------- | doc : `str` | Help text for the field. | | Returns | ------- | configurableField : `lsst.pex.config.ConfigurableField` | A `~ConfigurableField` for this task. | | Examples | -------- | Provides a convenient way to specify this task is a subtask of another | task. | | Here is an example of use: | | .. code-block:: python | | class OtherTaskConfig(lsst.pex.config.Config): | aSubtask = ATaskClass.makeField("brief description of task") | | ---------------------------------------------------------------------- | Data descriptors inherited from lsst.pipe.base.task.Task: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined)
You can follow up on the methods and attributes listed in the help()
output, with further help()
commands:
help(ImageDifferenceTask.getName)
Help on function getName in module lsst.pipe.base.task: getName(self) Get the name of the task. Returns ------- taskName : `str` Name of the task. See also -------- getFullName
The help()
function mostly prints out the __doc__
attribute:
print(ImageDifferenceTask.getName.__doc__)
Get the name of the task. Returns ------- taskName : `str` Name of the task. See also -------- getFullName
The Jupyter/IPython ?
magic command gives a different, condensed view that may sometimes be helpful:
? ImageDifferenceTask
All the DM code is housed in GitHub repositories in the lsst
organization.
It's nice to provide hyperlinks to the code you are demonstrating, so people can quickly go read the source. We can construct the GitHub URL from the module name, using the stackclub.where_is
utility.
from stackclub import where_is
from lsst.pipe.tasks.imageDifference import ImageDifferenceTask
where_is(ImageDifferenceTask)
[`lsst.pipe.tasks.imageDifference`](https://github.com/lsst/pipe_tasks/blob/master/python/lsst/pipe/tasks/imageDifference.py)
By default, where_is
looks for the named object in the source code on GitHub. You can specify this behavior explitly with the in_the
kwarg:
from lsst.daf.persistence import Butler
where_is(Butler.get, in_the='source')
[`lsst.daf.persistence.butler`](https://github.com/lsst/daf_persistence/blob/master/python/lsst/daf/persistence/butler.py)
In case you're interested in what the
where_is
function is doing, paste the following into a python cell:
%load ../stackclub/where_is
GitHub search is pretty powerful. Here's an example, using the search string user:lsst ImageDifferenceTask
and selecting "Code" results (in python):
https://github.com/search?l=Python&q=user%3Alsst+ImageDifferenceTask&type=Code
You can also generate search strings like this one with where_is
:
where_is(Butler, in_the='repo')
[searching for `Butler` in the `lsst` repo](https://github.com/search?l=Python&q=org%3Alsst+Butler&type=Code)
Finally, here's how to generate a search within the LSST DM technotes:
where_is(ImageDifferenceTask, in_the='technotes')
[searching for `ImageDifferenceTask` in the `lsst-dm` technotes](https://github.com/search?l=reStructuredText&q=org%3Alsst-dm+ImageDifferenceTask&type=Code)
In this tutorial we have explored two general ways to read more about the DM Stack code objects: the built-in notebook help
and magic '?' commands, and the stackclub.where_is
utility for locating the relevant part of the Stack source code.
Both of the above methods focus on the python code, which for many purposes will be sufficient. However, to understand the Stack's C++ primitives, we'll need to dig deeper into the DM Stack's doxygen documentation, as linked from https://pipelines.lsst.io.