LSSTApplications  10.0-2-g4f67435,11.0.rc2+1,11.0.rc2+12,11.0.rc2+3,11.0.rc2+4,11.0.rc2+5,11.0.rc2+6,11.0.rc2+7,11.0.rc2+8
LSSTDataManagementBasePackage
How to Write a Task

This document describes how to write a data processing task. For an introduction to data processing tasks, please read pipe_base introduction. It also helps to have a basic understanding of data repositories and how to use the butler to read and write data (to be written; for now read existing tasks to see how it is done).

After reading this document you may wish to read How to Write a Command-Line Task for additional information needed to write a command-line task.

Contents

Introduction

Tasks are subclasses of lsst.pipe.base.Task or (command-line tasks) lsst.pipe.base.CmdLineTask.

Tasks are constructed by calling __init__ with the task configuration. Occasionally additional arguments are required (see the task's documentation for details). lsst.pipe.base.Task.__init__ has a few other arguments that are usually only specified when a task is created as a subtask of another task; you will probably never have to specify them yourself. (Command-line tasks are often constructed and run by calling parseAndRun, as described in How to Write a Command-Line Task.)

Tasks typically have a run method that executes the task's main function. See Methods for more information.

Configuration

Every task requires a configuration: a task-specific set of configuration parameters. The configuration is read-only; once you construct a task, the same configuration will be used to process all data with that task. This makes the data processing more predictable: it does not depend on the order in which items of data are processed.

The task's configuration is specified using the pex_config package, as a task-specific subclass of lsst.pex.config.Config. The task class specifies its configuration class using class variable ConfigClass. If the task has no configuration parameters then it may use lsst.pex.config.Config as its configuration class.

Some important details of configurations:

ExampleSigmaClippedStatsTask uses configuration class ExampleSigmaClippedStatsConfig:

class ExampleSigmaClippedStatsConfig(pexConfig.Config):
"""!Configuration for ExampleSigmaClippedStatsTask
"""
badMaskPlanes = pexConfig.ListField(
dtype = str,
doc = "Mask planes that, if set, the associated pixel should not be included in the coaddTempExp.",
default = ("EDGE",),
)
numSigmaClip = pexConfig.Field(
doc = "number of sigmas at which to clip data",
dtype = float,
default = 3.0,
)
numIter = pexConfig.Field(
doc = "number of iterations of sigma clipping",
dtype = int,
default = 2,
)

The configuration class is specified as ExampleSigmaClippedStatsTask class variable ConfigClass, as described in Class Variables.

Class Variables

Tasks require several class variables to function:

Here are the class variables for exampleCmdLineTask.ExampleCmdLineTask:

ConfigClass = ExampleCmdLineConfig
_DefaultName = "exampleTask"

Methods

Tasks have the following important methods:

These methods are described in more depth below:

The __init__ Method

Use the __init__ method (task constructor) to do the following:

Here is exampleCmdLineTask.ExampleCmdLineTask.__init__:

def __init__(self, *args, **kwargs):
"""Construct an ExampleCmdLineTask
Call the parent class constructor and make the "stats" subtask from the config field of the same name.
"""
pipeBase.CmdLineTask.__init__(self, *args, **kwargs)
self.makeSubtask("stats")

That task creates a subtask named stats to compute image statistics. Here is the __init__ method for the default version of the stats subtask: exampleTask.ExampleSigmaClippedStatsTask, which is slightly more interesting:

def __init__(self, *args, **kwargs):
"""!Construct an ExampleSigmaClippedStatsTask
The init method may compute anything that that does not require data.
In this case we create a statistics control object using the config
(which cannot change once the task is created).
"""
pipeBase.Task.__init__(self, *args, **kwargs)
self._badPixelMask = MaskU.getPlaneBitMask(self.config.badMaskPlanes)
self._statsControl = afwMath.StatisticsControl()
self._statsControl.setNumSigmaClip(self.config.numSigmaClip)
self._statsControl.setNumIter(self.config.numIter)
self._statsControl.setAndMask(self._badPixelMask)
This creates a binary mask identifying bad pixels in the mask plane and an lsst.afw.math.StatisticsControl, specifying how statistics are computed. Both of these are constants, and thus are the same for each invocation of the run method; this is strongly recommended, as explained in the next section.

The run Method

Most tasks have a run method which perform's the task's data processing operation. This is required for command-line tasks and strongly recommended for most other tasks. One exception is if your task needs different methods to handle different data types (C++ handles this using overloaded functions, but the standard technique is Python is to provide different methods for different call signatures).

If your task's processing can be divided into logical units, then we recommend that you provide methods for each unit. run can then call each method to do its work. This allows your task to be more easily adapted: a subclass can override just a few methods.

We strongly recommend that you make your task stateless, by not using instance variables as part of your data processing. Pass data between methods by calling and returning it. This makes the task much easier to reason about, since processing one item of data cannot affect future items of data.

The run method should always return its results in an lsst.pipe.base.struct.Struct object, with a named field for each item of data. This is safer than returning a tuple of items, and allows adding fields without affecting existing code. Other methods should also return Structs if they return more than one or two items.

Any method that is likely to take significant time or memory should be preceded by this python decorator: @lsst.pipe.base.timeMethod. This automatically records the execution time and memory of the method in the task's metadata attribute.

The example exampleCmdLineTask.ExampleCmdLineTask is so simple that it needs no other methods; run does everything:

@pipeBase.timeMethod
def run(self, dataRef):
"""!Compute a few statistics on the image plane of an exposure
@param dataRef: data reference for a calibrated science exposure ("calexp")
@return a pipeBase Struct containing:
- mean: mean of image plane
- meanErr: uncertainty in mean
- stdDev: standard deviation of image plane
- stdDevErr: uncertainty in standard deviation
"""
self.log.info("Processing data ID %s" % (dataRef.dataId,))
if self.config.doFail:
raise pipeBase.TaskError("Raising TaskError by request (config.doFail=True)")
# Unpersist the raw exposure pointed to by the data reference
rawExp = dataRef.get("raw")
maskedImage = rawExp.getMaskedImage()
# Support extra debug output.
# -
import lsstDebug
display = lsstDebug.Info(__name__).display
frame = 1
mtv(rawExp, frame=frame, title="exposure")
# return the pipe_base Struct that is returned by self.stats.run
return self.stats.run(maskedImage)

The statistics are actually computed by the "stats" subtask. Here is the run method for the default version of that task: exampleTask.ExampleSigmaClippedStatsTask.run:

@pipeBase.timeMethod
def run(self, maskedImage):
"""!Compute and return statistics for a masked image
@param[in] maskedImage: masked image (an lsst::afw::MaskedImage)
@return a pipeBase Struct containing:
- mean: mean of image plane
- meanErr: uncertainty in mean
- stdDev: standard deviation of image plane
- stdDevErr: uncertainty in standard deviation
"""
statObj = afwMath.makeStatistics(maskedImage, afwMath.MEANCLIP | afwMath.STDEVCLIP | afwMath.ERRORS,
self._statsControl)
mean, meanErr = statObj.getResult(afwMath.MEANCLIP)
stdDev, stdDevErr = statObj.getResult(afwMath.STDEVCLIP)
self.log.info("clipped mean=%0.2f; meanErr=%0.2f; stdDev=%0.2f; stdDevErr=%0.2f" % \
(mean, meanErr, stdDev, stdDevErr))
return pipeBase.Struct(
mean = mean,
meanErr = meanErr,
stdDev = stdDev,
stdDevErr = stdDevErr,
)

Debug Variables

Debug variables are variables the user may set while running your task, to enable additional debug output. To have your task support debug variables, have it import lsstDebug and call lsstDebug.Info(__name__).varname to get the debug variable varname specific to your task. If you look for a variable the user has not specified, it will have a value of False. For example, to look for a debug variable named "display":

1 import lsstDebug
2 display = lsstDebug.Info(__name__).display
3 if display:
4  ...

See Using lsstDebug to control debugging output for more information about debug variables, including how to specify them while running a command-line task.

Documentation

For others to use your task, it must be clearly documented. pipe/tasks/exampleStatsTasks.py and pipe/tasks/exampleCmdLineTask.py provide useful examples and documentation templates.

Content should include:

Use """! instead of """ to start doc strings (i.e. include an exclamation mark). This causes Doxygen to parse Doxygen commands in the doc string, which is almost always what you want.

Include a section such as the following in your task's documentation. This will make it appear on the page Task Documentation, even if your task is not in the pipe_tasks package.

## \addtogroup LSST_task_documentation
## \{
## \page exampleCmdLineTask
## \ref ExampleCmdLineTask "ExampleCmdLineTask"
##      An example intended to show how to write a command-line task.
## \}