LSSTApplications  10.0-2-g4f67435,11.0.rc2+1,11.0.rc2+12,11.0.rc2+3,11.0.rc2+4,11.0.rc2+5,11.0.rc2+6,11.0.rc2+7,11.0.rc2+8
LSSTDataManagementBasePackage
How to Write a Command-Line Task

This document describes how to write a command-line task, which is the LSST version of a complete data processing pipeline. To create a command-line task you will benefit from some background:

Contents

Introduction

A command-line task is an enhanced version of a regular task (see How to Write a Task). Regular tasks are only intended to be used as relatively self-contained stages in data processing pipelines, whereas command-line tasks can also be used as complete pipelines. As such, command-line tasks include Run Script that run them as pipelines.

Command-line tasks have the following key attributes, in addition to the attributes for regular tasks:

Run Script

A command-line task can be run as a pipeline via run script. This is usually a trivial script which merely calls the task's parseAndRun method. parseAndRun does the following:

The runner script for ExampleCmdLineTask is examples/exampleCmdLineTask.py:

from lsst.pipe.tasks.exampleCmdLineTask import ExampleCmdLineTask
ExampleCmdLineTask.parseAndRun()

For mosts command-line tasks you should put the run script into your package's bin/ directory, so that it is on your $PATH when you setup your package with eups. We did not want the run script for ExampleCmdLineTask to be quite so accessible, so we placed it in the examples/ directory instead of bin/.

Don't forget to and your run script executable using chmod +x.

Reading and Writing Data

The run method typically receives a single data reference, as mentioned above. It read and writes data using this data reference (or the underlying butler, if necessary).

Adding Dataset Types

Every time you write a task that writes a new kind of data (a new "dataset type") you must tell the butler about it. Similarly, if you write a new task for which you want to save configuration and metadata (which is the case for most tasks that process data), you have to tell the butler about it.

To add a dataset, edit the mapper configuration file for each obs_ package on whose data the task can be run. If the task is of general interest (wanted for most or all obs_ packages) then this process of updating all the mapper configuration files can be time consuming.

There are plans to change how mappers are configured. But as of this writing, mapper configuration files are contained in the policy directory of the obs_ package. For instance the configuration for the lsstSim mapper is defined in obs_lsstSim/policy/LsstSimMapper.paf.

Persisting Config and Metadata

Normally when you run a task you want the configuration for the task and the metadata generated by the task to be saved to the data repository. By default, this is done automatically, using dataset types:

where _DefaultName is the value of the task's _DefaultName class variable.

Whether you use these default dataset types or customize the dataset types, you will have to add dataset types for the configuration and metadata.

Customizing Config and Metadata Dataset Types

Occasionally the default dataset types for configuration and metadata are not sufficient. For instance in the case of the pipe.tasks.MakeSkyMapTask and various co-addition tasks, the co-add type must be part of the config and metadata dataset type name. To customize the dataset type of a task's config or metadata, define task methods _getConfigName and _getMetadataName to return the desired names.

Prevent Saving Config and Metadata

For some tasks you may wish to not save config and metadata at all. This is appropriate for tasks that simply report information without saving data. To disable saving configuration and metadata, define task methods _getConfigName and _getMetadataName methods to return None.

Custom Argument Parser

The default argument parser returned by CmdLineTask._makeArgumentParser assumes that your task's run method processes raw or calibrated images. If this is not the case you can easily provide a modified argument parser.

Typically this consists of constructing an instance of lsst.pipe.base.ArgumentParser and then adding some ID arguments to it using ArgumentParser.add_id_argument. This is shown in several examples below. Please resist the urge to add other kinds of arguments to the argument parser unless truly needed. One strength of our tasks is how similar they are to each other. Learning one set of arguments suffices to use many tasks.

Warning
If your task requires a custom argument parser to do more than just change the type of the single data reference, then it also require a custom task runner, as well.

Here are some examples:

Custom Task Runner

The standard task runner is lsst.pipe.base.TaskRunner. It assumes that your task's run method wants a single data reference and nothing else. If that is not the case then you will have to provide a custom task runner for your task. This involves writing a subclass of lsst.pipe.base.TaskRunner and specifying it in your task using the RunnerClass Class Variables" class variable".

Here are some situations where a custom task runner is required: