Run a command-line task, using multiprocessing if requested. More...

Inheritance diagram for lsst.pipe.base.cmdLineTask.TaskRunner:

Public Member Functions
def	__init__
	Construct a TaskRunner. More...

def	prepareForMultiProcessing
	Prepare this instance for multiprocessing by removing optional non-picklable elements. More...

def	run
	Run the task on all targets. More...

def	makeTask
	Create a Task instance. More...

def	precall
	Hook for code that should run exactly once, before multiprocessing is invoked. More...

def	__call__
	Run the Task on a single target. More...

Static Public Member Functions
def	getTargetList
	Return a list of (dataRef, kwargs) to be used as arguments for TaskRunner. More...

Public Attributes
	TaskClass

	doReturnResults

	config

	log

	doRaise

	clobberConfig

	numProcesses

	timeout

Static Public Attributes
int	TIMEOUT = 9999

Detailed Description

Run a command-line task, using multiprocessing if requested.

Each command-line task (subclass of CmdLineTask) has a task runner. By default it is this class, but some tasks require a subclass. See the manual "how to write a command-line task" in the pipe_tasks documentation for more information. See CmdLineTask.parseAndRun to see how a task runner is used.

You may use this task runner for your command-line task if your task has a run method that takes exactly one argument: a butler data reference. Otherwise you must provide a task-specific subclass of this runner for your task's RunnerClass that overrides TaskRunner.getTargetList and possibly TaskRunner.__call__. See TaskRunner.getTargetList for details.

This design matches the common pattern for command-line tasks: the run method takes a single data reference, of some suitable name. Additional arguments are rare, and if present, require a subclass of TaskRunner that calls these additional arguments by name.

Instances of this class must be picklable in order to be compatible with multiprocessing. If multiprocessing is requested (parsedCmd.numProcesses > 1) then run() calls prepareForMultiProcessing to jettison optional non-picklable elements. If your task runner is not compatible with multiprocessing then indicate this in your task by setting class variable canMultiprocess=False.

Due to a python bug [1], handling a KeyboardInterrupt properly requires specifying a timeout [2]. This timeout (in sec) can be specified as the "timeout" element in the output from ArgumentParser (the "parsedCmd"), if available, otherwise we use TaskRunner.TIMEOUT_DEFAULT.

[1] http://bugs.python.org/issue8296 [2] http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool)

Definition at line 99 of file cmdLineTask.py.

Constructor & Destructor Documentation

def lsst.pipe.base.cmdLineTask.TaskRunner.__init__	(	self,
		TaskClass,
		parsedCmd,
		doReturnResults = `False`
	)

Construct a TaskRunner.

Warning: Do not store parsedCmd, as this instance is pickled (if multiprocessing) and parsedCmd may contain non-picklable elements. It certainly contains more data than we need to send to each instance of the task.

Parameters

TaskClass	The class of the task to run
parsedCmd	The parsed command-line arguments, as returned by the task's argument parser's parse_args method.
doReturnResults	Should run return the collected result from each invocation of the task? This is only intended for unit tests and similar use. It can easily exhaust memory (if the task returns enough data and you call it enough times) and it will fail when using multiprocessing if the returned data cannot be pickled.

Exceptions

ImportError if multiprocessing requested (and the task supports it) but the multiprocessing library cannot be imported.

Definition at line 130 of file cmdLineTask.py.

 
     def __init__(self, TaskClass, parsedCmd, doReturnResults=False):
         """!Construct a TaskRunner
         
         @warning Do not store parsedCmd, as this instance is pickled (if multiprocessing) and parsedCmd may
         contain non-picklable elements. It certainly contains more data than we need to send to each
         instance of the task.
 
         @param TaskClass    The class of the task to run
         @param parsedCmd    The parsed command-line arguments, as returned by the task's argument parser's
                             parse_args method.
         @param doReturnResults    Should run return the collected result from each invocation of the task?
             This is only intended for unit tests and similar use.
             It can easily exhaust memory (if the task returns enough data and you call it enough times)
             and it will fail when using multiprocessing if the returned data cannot be pickled.
         
         @throws ImportError if multiprocessing requested (and the task supports it)
         but the multiprocessing library cannot be imported.
         """
         self.TaskClass = TaskClass
         self.doReturnResults = bool(doReturnResults)
         self.config = parsedCmd.config
         self.log = parsedCmd.log
         self.doRaise = bool(parsedCmd.doraise)
         self.clobberConfig = bool(parsedCmd.clobberConfig)
         self.numProcesses = int(getattr(parsedCmd, 'processes', 1))
 
         self.timeout = getattr(parsedCmd, 'timeout', None)
         if self.timeout is None or self.timeout <= 0:
             self.timeout = self.TIMEOUT
 
         if self.numProcesses > 1:
             if not TaskClass.canMultiprocess:
                 self.log.warn("This task does not support multiprocessing; using one process")
                 self.numProcesses = 1

lsst.pipe.base.cmdLineTask.TaskRunner.__init__

def __init__

Construct a TaskRunner.

Definition: cmdLineTask.py:130

lsst.pipe.base.cmdLineTask.TaskRunner.log

log

Definition: cmdLineTask.py:151

lsst.pipe.base.cmdLineTask.TaskRunner.config

config

Definition: cmdLineTask.py:150

lsst.pipe.base.cmdLineTask.TaskRunner.doReturnResults

doReturnResults

Definition: cmdLineTask.py:149

lsst.pipe.base.cmdLineTask.TaskRunner.TIMEOUT

int TIMEOUT

Definition: cmdLineTask.py:129

lsst.pipe.base.cmdLineTask.TaskRunner.TaskClass

TaskClass

Definition: cmdLineTask.py:148

lsst.pipe.base.cmdLineTask.TaskRunner.numProcesses

numProcesses

Definition: cmdLineTask.py:154

lsst.pipe.base.cmdLineTask.TaskRunner.doRaise

doRaise

Definition: cmdLineTask.py:152

lsst.pipe.base.cmdLineTask.TaskRunner.clobberConfig

clobberConfig

Definition: cmdLineTask.py:153

lsst.pipe.base.cmdLineTask.TaskRunner.timeout

timeout

Definition: cmdLineTask.py:156

parsedCmd	the parsed command object (an argparse.Namespace) returned by ArgumentParser.parse_args.
**kwargs	any additional keyword arguments. In the default TaskRunner this is an empty dict, but having it simplifies overriding TaskRunner for tasks whose run method takes additional arguments (see case (1) below).

def lsst.pipe.base.cmdLineTask.TaskRunner.makeTask	(	self,
		parsedCmd = `None`,
		args = `None`
	)

[in]	parsedCmd	parsed command-line options (used for extra task args by some task runners)
[in]	args	args tuple passed to TaskRunner.__call__ (used for extra task arguments by some task runners)

Public Member Functions

Static Public Member Functions

Public Attributes

Static Public Attributes

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation

def lsst.pipe.base.cmdLineTask.TaskRunner.__call__	(	self,
		args
	)

def lsst.pipe.base.cmdLineTask.TaskRunner.precall	(	self,
		parsedCmd
	)