LSSTApplications
10.0-2-g4f67435,11.0.rc2+1,11.0.rc2+12,11.0.rc2+3,11.0.rc2+4,11.0.rc2+5,11.0.rc2+6,11.0.rc2+7,11.0.rc2+8
LSSTDataManagementBasePackage
|
This document describes how to write a command-line task, which is the LSST version of a complete data processing pipeline. To create a command-line task you will benefit from some background:
A command-line task is an enhanced version of a regular task (see How to Write a Task). Regular tasks are only intended to be used as relatively self-contained stages in data processing pipelines, whereas command-line tasks can also be used as complete pipelines. As such, command-line tasks include Run Script that run them as pipelines.
Command-line tasks have the following key attributes, in addition to the attributes for regular tasks:
run
method which performs the full pipeline data processing.run
method takes exactly one argument: a data reference for the item of data to be processed. Variations are possible, but require that you provide a Custom Argument Parser and often a Custom Task Runner.run
method accepts a single data reference, such as ExampleCmdLineTask. If your task's run
method needs something else then you will have to provide a custom task runner.canMultiprocess
, which defaults to True
. If your task runner cannot run your task with multiprocessing then set it False
. Note: multiprocessing only affects how the task runner calls the top-level task; thus it is ignored when a task is used as a subtask.A command-line task can be run as a pipeline via run script. This is usually a trivial script which merely calls the task's parseAndRun
method. parseAndRun
does the following:
run
method once for each data item to process.The runner script for ExampleCmdLineTask is examples/exampleCmdLineTask.py
:
For mosts command-line tasks you should put the run script into your package's bin/
directory, so that it is on your $PATH
when you setup your package with eups. We did not want the run script for ExampleCmdLineTask to be quite so accessible, so we placed it in the examples/
directory instead of bin/
.
Don't forget to and your run script executable using chmod +x
.
The run method typically receives a single data reference, as mentioned above. It read and writes data using this data reference (or the underlying butler, if necessary).
Every time you write a task that writes a new kind of data (a new "dataset type") you must tell the butler about it. Similarly, if you write a new task for which you want to save configuration and metadata (which is the case for most tasks that process data), you have to tell the butler about it.
To add a dataset, edit the mapper configuration file for each obs_ package on whose data the task can be run. If the task is of general interest (wanted for most or all obs_ packages) then this process of updating all the mapper configuration files can be time consuming.
There are plans to change how mappers are configured. But as of this writing, mapper configuration files are contained in the policy directory of the obs_ package. For instance the configuration for the lsstSim mapper is defined in obs_lsstSim/policy/LsstSimMapper.paf.
Normally when you run a task you want the configuration for the task and the metadata generated by the task to be saved to the data repository. By default, this is done automatically, using dataset types:
_config
for the configuration_metadata
for the metadatawhere _DefaultName is the value of the task's _DefaultName
class variable.
Whether you use these default dataset types or customize the dataset types, you will have to add dataset types for the configuration and metadata.
Occasionally the default dataset types for configuration and metadata are not sufficient. For instance in the case of the pipe.tasks.MakeSkyMapTask and various co-addition tasks, the co-add type must be part of the config and metadata dataset type name. To customize the dataset type of a task's config or metadata, define task methods _getConfigName
and _getMetadataName
to return the desired names.
For some tasks you may wish to not save config and metadata at all. This is appropriate for tasks that simply report information without saving data. To disable saving configuration and metadata, define task methods _getConfigName
and _getMetadataName
methods to return None
.
The default argument parser returned by CmdLineTask._makeArgumentParser
assumes that your task's run method processes raw or calibrated images. If this is not the case you can easily provide a modified argument parser.
Typically this consists of constructing an instance of lsst.pipe.base.ArgumentParser and then adding some ID arguments to it using ArgumentParser.add_id_argument. This is shown in several examples below. Please resist the urge to add other kinds of arguments to the argument parser unless truly needed. One strength of our tasks is how similar they are to each other. Learning one set of arguments suffices to use many tasks.
Here are some examples:
run
methd requires a data reference of some kind other than a raw or calibrated image. This is a common case, and easily solved. For example the processCoadd.ProcessCoaddTask processes co-adds, which are specified by sky map patch. Here is ProcessCoaddTask._makeArgumentParser
: A task's run
method requires more than one kind of data reference. An example is co-addition, which requires the user to specify the co-add as a sky map patch, and optionally allows the user to specify a list of exposures to co-add. CoaddBaseTask._makeArgumentParser
is a straightforward example of specifying two data IDs arguments: one for the sky map patch, and an optional ID argument for which exposures to co-add:
In this case the custom container class SelectDataIdContainer adds additional information for the task, to save processing time.
run
method requires no data references at all. An example is makeSkyMap.MakeSkyMapTask, which makes a sky map for a set of co-adds. makeSkyMap.MakeSkyMapTask._makeArgumentParser is trivial: The standard task runner is lsst.pipe.base.TaskRunner. It assumes that your task's run
method wants a single data reference and nothing else. If that is not the case then you will have to provide a custom task runner for your task. This involves writing a subclass of lsst.pipe.base.TaskRunner and specifying it in your task using the RunnerClass
Class Variables" class variable".
Here are some situations where a custom task runner is required:
run
method requires extra arguments. An example is co-addition, which optionally accepts a list of images to co-add. The custom task runner is coaddBase.CoaddTaskRunner and is pleasantly simple: __call__
method must be overridden: