LSSTApplications  10.0-2-g4f67435,11.0.rc2+1,11.0.rc2+12,11.0.rc2+3,11.0.rc2+4,11.0.rc2+5,11.0.rc2+6,11.0.rc2+7,11.0.rc2+8
LSSTDataManagementBasePackage
Public Member Functions | Static Public Member Functions | Public Attributes | Private Member Functions | List of all members
lsst.daf.persistence.butler.Butler Class Reference
Inheritance diagram for lsst.daf.persistence.butler.Butler:

Public Member Functions

def __init__
 
def getKeys
 
def queryMetadata
 
def datasetExists
 
def get
 
def put
 
def subset
 
def dataRef
 
def __reduce__
 

Static Public Member Functions

def getMapperClass
 

Public Attributes

 mapper
 
 persistence
 
 log
 

Private Member Functions

def _combineDicts
 
def _read
 

Detailed Description

Butler provides a generic mechanism for persisting and retrieving data using mappers.

A Butler manages a collection of datasets known as a repository.  Each
dataset has a type representing its intended usage and a location.  Note
that the dataset type is not the same as the C++ or Python type of the
object containing the data.  For example, an ExposureF object might be
used to hold the data for a raw image, a post-ISR image, a calibrated
science image, or a difference image.  These would all be different
dataset types.

A Butler can produce a collection of possible values for a key (or tuples
of values for multiple keys) if given a partial data identifier.  It can
check for the existence of a file containing a dataset given its type and
data identifier.  The Butler can then retrieve the dataset.  Similarly, it
can persist an object to an appropriate location when given its associated
data identifier.

Note that the Butler has two more advanced features when retrieving a data
set.  First, the retrieval is lazy.  Input does not occur until the data
set is actually accessed.  This allows datasets to be retrieved and
placed on a clipboard prospectively with little cost, even if the
algorithm of a stage ends up not using them.  Second, the Butler will call
a standardization hook upon retrieval of the dataset.  This function,
contained in the input mapper object, must perform any necessary
manipulations to force the retrieved object to conform to standards,
including translating metadata.

Public methods:

__init__(self, root, mapper=None, **mapperArgs)

getKeys(self, datasetType=None, level=None)

queryMetadata(self, datasetType, keys, format=None, dataId={}, **rest)

datasetExists(self, datasetType, dataId={}, **rest)

get(self, datasetType, dataId={}, immediate=False, **rest)

put(self, obj, datasetType, dataId={}, **rest)

subset(self, datasetType, level=None, dataId={}, **rest))

Definition at line 38 of file butler.py.

Constructor & Destructor Documentation

def lsst.daf.persistence.butler.Butler.__init__ (   self,
  root,
  mapper = None,
  mapperArgs 
)
Construct the Butler.  If no mapper class is provided, then a file
named "_mapper" is expected to be found in the repository, which
must be a filesystem path.  The first line in that file is read and
must contain the fully-qualified name of a Mapper subclass, which is
then imported and instantiated using the root and the mapperArgs.

@param root (str)       the repository to be managed (at least
                initially).  May be None if a mapper is
                provided.
@param mapper (Mapper)  if present, the Mapper subclass instance
                to be used as the butler's mapper.
@param **mapperArgs     arguments to be passed to the mapper's
                __init__ method, in addition to the root.

Definition at line 112 of file butler.py.

113  def __init__(self, root, mapper=None, **mapperArgs):
114  """Construct the Butler. If no mapper class is provided, then a file
115  named "_mapper" is expected to be found in the repository, which
116  must be a filesystem path. The first line in that file is read and
117  must contain the fully-qualified name of a Mapper subclass, which is
118  then imported and instantiated using the root and the mapperArgs.
119 
120  @param root (str) the repository to be managed (at least
121  initially). May be None if a mapper is
122  provided.
123  @param mapper (Mapper) if present, the Mapper subclass instance
124  to be used as the butler's mapper.
125  @param **mapperArgs arguments to be passed to the mapper's
126  __init__ method, in addition to the root."""
127 
128  if mapper is not None:
129  self.mapper = mapper
130  else:
131  cls = Butler.getMapperClass(root)
132  self.mapper = cls(root=root, **mapperArgs)
133 
134  # Always use an empty Persistence policy until we can get rid of it
135  persistencePolicy = pexPolicy.Policy()
136  self.persistence = Persistence.getPersistence(persistencePolicy)
137  self.log = pexLog.Log(pexLog.Log.getDefaultLog(),
138  "daf.persistence.butler")
a container for holding hierarchical configuration data in memory.
Definition: Policy.h:169
a place to record messages and descriptions of the state of processing.
Definition: Log.h:154

Member Function Documentation

def lsst.daf.persistence.butler.Butler.__reduce__ (   self)

Definition at line 435 of file butler.py.

436  def __reduce__(self):
437  return (_unreduce, (self.mapper,))
438 
def lsst.daf.persistence.butler.Butler._combineDicts (   self,
  dataId,
  rest 
)
private

Definition at line 377 of file butler.py.

378  def _combineDicts(self, dataId, **rest):
379  finalId = {}
380  finalId.update(dataId)
381  finalId.update(rest)
382  return finalId
def lsst.daf.persistence.butler.Butler._read (   self,
  pythonType,
  location 
)
private

Definition at line 383 of file butler.py.

384  def _read(self, pythonType, location):
385  trace = pexLog.BlockTimingLog(self.log, "read",
386  pexLog.BlockTimingLog.INSTRUM+1)
387 
388  additionalData = location.getAdditionalData()
389  # Create a list of Storages for the item.
390  storageName = location.getStorageName()
391  results = []
392  locations = location.getLocations()
393  returnList = True
394  if len(locations) == 1:
395  returnList = False
396 
397  for locationString in locations:
398  logLoc = LogicalLocation(locationString, additionalData)
399  trace.start("read from %s(%s)" % (storageName, logLoc.locString()))
400 
401  if storageName == "PafStorage":
402  finalItem = pexPolicy.Policy.createPolicy(logLoc.locString())
403  elif storageName == "PickleStorage":
404  if not os.path.exists(logLoc.locString()):
405  raise RuntimeError, \
406  "No such pickle file: " + logLoc.locString()
407  with open(logLoc.locString(), "rb") as infile:
408  finalItem = cPickle.load(infile)
409  elif storageName == "FitsCatalogStorage":
410  if not os.path.exists(logLoc.locString()):
411  raise RuntimeError, \
412  "No such FITS catalog file: " + logLoc.locString()
413  hdu = additionalData.getInt("hdu", 0)
414  flags = additionalData.getInt("flags", 0)
415  finalItem = pythonType.readFits(logLoc.locString(), hdu, flags)
416  elif storageName == "ConfigStorage":
417  if not os.path.exists(logLoc.locString()):
418  raise RuntimeError, \
419  "No such config file: " + logLoc.locString()
420  finalItem = pythonType()
421  finalItem.load(logLoc.locString())
422  else:
423  storageList = StorageList()
424  storage = self.persistence.getRetrieveStorage(storageName, logLoc)
425  storageList.append(storage)
426  itemData = self.persistence.unsafeRetrieve(
427  location.getCppType(), storageList, additionalData)
428  finalItem = pythonType.swigConvert(itemData)
429  trace.done()
430  results.append(finalItem)
431 
432  if not returnList:
433  results = results[0]
434  return results
Class for logical location of a persisted Persistable instance.
def lsst.daf.persistence.butler.Butler.dataRef (   self,
  datasetType,
  level = None,
  dataId = {},
  rest 
)
Returns a single ButlerDataRef.

Given a complete dataId specified in dataId and **rest, find the
unique dataset at the given level specified by a dataId key (e.g.
visit or sensor or amp for a camera) and return a ButlerDataRef.

@param datasetType (str)  the type of dataset collection to reference
@param level (str)        the level of dataId at which to reference
@param dataId (dict)      the data id.
@param **rest             keyword arguments for the data id.
@returns (ButlerDataRef) ButlerDataRef for dataset matching the data id

Definition at line 354 of file butler.py.

355  def dataRef(self, datasetType, level=None, dataId={}, **rest):
356  """Returns a single ButlerDataRef.
357 
358  Given a complete dataId specified in dataId and **rest, find the
359  unique dataset at the given level specified by a dataId key (e.g.
360  visit or sensor or amp for a camera) and return a ButlerDataRef.
361 
362  @param datasetType (str) the type of dataset collection to reference
363  @param level (str) the level of dataId at which to reference
364  @param dataId (dict) the data id.
365  @param **rest keyword arguments for the data id.
366  @returns (ButlerDataRef) ButlerDataRef for dataset matching the data id
367  """
368 
369  subset = self.subset(datasetType, level, dataId, **rest)
370  if len(subset) != 1:
371  raise RuntimeError, """No unique dataset for:
372  Dataset type = %s
373  Level = %s
374  Data ID = %s
375  Keywords = %s""" % (str(datasetType), str(level), str(dataId), str(rest))
376  return ButlerDataRef(subset, subset.cache[0])
def lsst.daf.persistence.butler.Butler.datasetExists (   self,
  datasetType,
  dataId = {},
  rest 
)
Determines if a dataset file exists.

@param datasetType (str)   the type of dataset to inquire about.
@param dataId (dict)       the data id of the dataset.
@param **rest              keyword arguments for the data id.
@returns (bool) True if the dataset exists or is non-file-based.

Definition at line 177 of file butler.py.

178  def datasetExists(self, datasetType, dataId={}, **rest):
179  """Determines if a dataset file exists.
180 
181  @param datasetType (str) the type of dataset to inquire about.
182  @param dataId (dict) the data id of the dataset.
183  @param **rest keyword arguments for the data id.
184  @returns (bool) True if the dataset exists or is non-file-based.
185  """
186 
187  dataId = self._combineDicts(dataId, **rest)
188  location = self.mapper.map(datasetType, dataId)
189  additionalData = location.getAdditionalData()
190  storageName = location.getStorageName()
191  if storageName in ('BoostStorage', 'FitsStorage', 'PafStorage',
192  'PickleStorage', 'ConfigStorage', 'FitsCatalogStorage'):
193  locations = location.getLocations()
194  for locationString in locations:
195  logLoc = LogicalLocation(locationString, additionalData).locString()
196  if storageName == 'FitsStorage':
197  # Strip off directives for cfitsio (in square brackets, e.g., extension name)
198  bracket = logLoc.find('[')
199  if bracket > 0:
200  logLoc = logLoc[:bracket]
201  if not os.path.exists(logLoc):
202  return False
203  return True
204  self.log.log(pexLog.Log.WARN,
205  "datasetExists() for non-file storage %s, dataset type=%s, keys=%s" %
206  (storageName, datasetType, str(dataId)))
207  return True
Class for logical location of a persisted Persistable instance.
def lsst.daf.persistence.butler.Butler.get (   self,
  datasetType,
  dataId = {},
  immediate = False,
  rest 
)
Retrieves a dataset given an input collection data id.

@param datasetType (str)   the type of dataset to retrieve.
@param dataId (dict)       the data id.
@param immediate (bool)    don't use a proxy for delayed loading.
@param **rest              keyword arguments for the data id.
@returns an object retrieved from the dataset (or a proxy for one).

Definition at line 208 of file butler.py.

209  def get(self, datasetType, dataId={}, immediate=False, **rest):
210  """Retrieves a dataset given an input collection data id.
211 
212  @param datasetType (str) the type of dataset to retrieve.
213  @param dataId (dict) the data id.
214  @param immediate (bool) don't use a proxy for delayed loading.
215  @param **rest keyword arguments for the data id.
216  @returns an object retrieved from the dataset (or a proxy for one).
217  """
218  dataId = self._combineDicts(dataId, **rest)
219  location = self.mapper.map(datasetType, dataId)
220  self.log.log(pexLog.Log.DEBUG, "Get type=%s keys=%s from %s" %
221  (datasetType, dataId, str(location)))
222 
223  if location.getPythonType() is not None:
224  # import this pythonType dynamically
225  pythonTypeTokenList = location.getPythonType().split('.')
226  importClassString = pythonTypeTokenList.pop()
227  importClassString = importClassString.strip()
228  importPackage = ".".join(pythonTypeTokenList)
229  importType = __import__(importPackage, globals(), locals(), \
230  [importClassString], -1)
231  pythonType = getattr(importType, importClassString)
232  else:
233  pythonType = None
234  if hasattr(self.mapper, "bypass_" + datasetType):
235  bypassFunc = getattr(self.mapper, "bypass_" + datasetType)
236  callback = lambda: bypassFunc(datasetType, pythonType,
237  location, dataId)
238  else:
239  callback = lambda: self._read(pythonType, location)
240  if self.mapper.canStandardize(datasetType):
241  innerCallback = callback
242  callback = lambda: self.mapper.standardize(datasetType,
243  innerCallback(), dataId)
244  if immediate:
245  return callback()
246  return ReadProxy(callback)
def lsst.daf.persistence.butler.Butler.getKeys (   self,
  datasetType = None,
  level = None 
)
Returns a dict.  The dict keys are the valid data id keys at or
above the given level of hierarchy for the dataset type or the entire
collection if None.  The dict values are the basic Python types
corresponding to the keys (int, float, str).

@param datasetType (str)  the type of dataset to get keys for, entire
                  collection if None.
@param level (str)        the hierarchy level to descend to or None.
@returns (dict) valid data id keys; values are corresponding types.

Definition at line 139 of file butler.py.

140  def getKeys(self, datasetType=None, level=None):
141 
142  """Returns a dict. The dict keys are the valid data id keys at or
143  above the given level of hierarchy for the dataset type or the entire
144  collection if None. The dict values are the basic Python types
145  corresponding to the keys (int, float, str).
146 
147  @param datasetType (str) the type of dataset to get keys for, entire
148  collection if None.
149  @param level (str) the hierarchy level to descend to or None.
150  @returns (dict) valid data id keys; values are corresponding types."""
151 
152  return self.mapper.getKeys(datasetType, level)
def lsst.daf.persistence.butler.Butler.getMapperClass (   root)
static
Return the mapper class associated with a repository root.

Definition at line 85 of file butler.py.

85 
86  def getMapperClass(root):
87  """Return the mapper class associated with a repository root."""
88 
89  # Find a "_mapper" file containing the mapper class name
90  basePath = root
91  mapperFile = "_mapper"
92  globals = {}
93  while not os.path.exists(os.path.join(basePath, mapperFile)):
94  # Break abstraction by following _parent links from CameraMapper
95  if os.path.exists(os.path.join(basePath, "_parent")):
96  basePath = os.path.join(basePath, "_parent")
97  else:
98  raise RuntimeError(
99  "No mapper provided and no %s available" %
100  (mapperFile,))
101  mapperFile = os.path.join(basePath, mapperFile)
102 
103  # Read the name of the mapper class and instantiate it
104  with open(mapperFile, "r") as f:
105  mapperName = f.readline().strip()
106  components = mapperName.split(".")
107  if len(components) <= 1:
108  raise RuntimeError("Unqualified mapper name %s in %s" %
109  (mapperName, mapperFile))
110  pkg = importlib.import_module(".".join(components[:-1]))
111  return getattr(pkg, components[-1])
def lsst.daf.persistence.butler.Butler.put (   self,
  obj,
  datasetType,
  dataId = {},
  doBackup = False,
  rest 
)
Persists a dataset given an output collection data id.

@param obj                 the object to persist.
@param datasetType (str)   the type of dataset to persist.
@param dataId (dict)       the data id.
@param doBackup            if True, rename existing instead of overwriting
@param **rest         keyword arguments for the data id.

WARNING: Setting doBackup=True is not safe for parallel processing, as it
may be subject to race conditions.

Definition at line 247 of file butler.py.

248  def put(self, obj, datasetType, dataId={}, doBackup=False, **rest):
249  """Persists a dataset given an output collection data id.
250 
251  @param obj the object to persist.
252  @param datasetType (str) the type of dataset to persist.
253  @param dataId (dict) the data id.
254  @param doBackup if True, rename existing instead of overwriting
255  @param **rest keyword arguments for the data id.
256 
257  WARNING: Setting doBackup=True is not safe for parallel processing, as it
258  may be subject to race conditions.
259  """
260  if doBackup:
261  self.mapper.backup(datasetType, dataId)
262  dataId = self._combineDicts(dataId, **rest)
263  location = self.mapper.map(datasetType, dataId, write=True)
264  self.log.log(pexLog.Log.DEBUG, "Put type=%s keys=%s to %s" %
265  (datasetType, dataId, str(location)))
266  additionalData = location.getAdditionalData()
267  storageName = location.getStorageName()
268  locations = location.getLocations()
269  # TODO support multiple output locations
270  locationString = locations[0]
271  logLoc = LogicalLocation(locationString, additionalData)
272  trace = pexLog.BlockTimingLog(self.log, "put",
273  pexLog.BlockTimingLog.INSTRUM+1)
274  trace.setUsageFlags(trace.ALLUDATA)
275 
276  if storageName == "PickleStorage":
277  trace.start("write to %s(%s)" % (storageName, logLoc.locString()))
278  outDir = os.path.dirname(logLoc.locString())
279  if outDir != "" and not os.path.exists(outDir):
280  try:
281  os.makedirs(outDir)
282  except OSError, e:
283  # Don't fail if directory exists due to race
284  if e.errno != 17:
285  raise e
286  with open(logLoc.locString(), "wb") as outfile:
287  cPickle.dump(obj, outfile, cPickle.HIGHEST_PROTOCOL)
288  trace.done()
289  return
290 
291  if storageName == "ConfigStorage":
292  trace.start("write to %s(%s)" % (storageName, logLoc.locString()))
293  outDir = os.path.dirname(logLoc.locString())
294  if outDir != "" and not os.path.exists(outDir):
295  try:
296  os.makedirs(outDir)
297  except OSError, e:
298  # Don't fail if directory exists due to race
299  if e.errno != 17:
300  raise e
301  obj.save(logLoc.locString())
302  trace.done()
303  return
304 
305  if storageName == "FitsCatalogStorage":
306  trace.start("write to %s(%s)" % (storageName, logLoc.locString()))
307  outDir = os.path.dirname(logLoc.locString())
308  if outDir != "" and not os.path.exists(outDir):
309  try:
310  os.makedirs(outDir)
311  except OSError, e:
312  # Don't fail if directory exists due to race
313  if e.errno != 17:
314  raise e
315  flags = additionalData.getInt("flags", 0)
316  obj.writeFits(logLoc.locString(), flags=flags)
317  trace.done()
318  return
319 
320  # Create a list of Storages for the item.
321  storageList = StorageList()
322  storage = self.persistence.getPersistStorage(storageName, logLoc)
323  storageList.append(storage)
324  trace.start("write to %s(%s)" % (storageName, logLoc.locString()))
325 
326  # Persist the item.
327  if hasattr(obj, '__deref__'):
328  # We have a smart pointer, so dereference it.
329  self.persistence.persist(
330  obj.__deref__(), storageList, additionalData)
331  else:
332  self.persistence.persist(obj, storageList, additionalData)
333  trace.done()
Class for logical location of a persisted Persistable instance.
def lsst.daf.persistence.butler.Butler.queryMetadata (   self,
  datasetType,
  key,
  format = None,
  dataId = {},
  rest 
)
Returns the valid values for one or more keys when given a partial
input collection data id.

@param datasetType (str)    the type of dataset to inquire about.
@param key (str)            a key giving the level of granularity of the inquiry.
@param format (str, tuple)  an optional key or tuple of keys to be returned. 
@param dataId (dict)        the partial data id.
@param **rest               keyword arguments for the partial data id.
@returns (list) a list of valid values or tuples of valid values as
specified by the format (defaulting to the same as the key) at the
key's level of granularity.

Definition at line 153 of file butler.py.

154  def queryMetadata(self, datasetType, key, format=None, dataId={}, **rest):
155  """Returns the valid values for one or more keys when given a partial
156  input collection data id.
157 
158  @param datasetType (str) the type of dataset to inquire about.
159  @param key (str) a key giving the level of granularity of the inquiry.
160  @param format (str, tuple) an optional key or tuple of keys to be returned.
161  @param dataId (dict) the partial data id.
162  @param **rest keyword arguments for the partial data id.
163  @returns (list) a list of valid values or tuples of valid values as
164  specified by the format (defaulting to the same as the key) at the
165  key's level of granularity.
166  """
167 
168  dataId = self._combineDicts(dataId, **rest)
169  if format is None:
170  format = (key,)
171  elif not hasattr(format, '__iter__'):
172  format = (format,)
173  tuples = self.mapper.queryMetadata(datasetType, key, format, dataId)
174  if len(format) == 1:
175  return [x[0] for x in tuples]
176  return tuples
def lsst.daf.persistence.butler.Butler.subset (   self,
  datasetType,
  level = None,
  dataId = {},
  rest 
)
Extracts a subset of a dataset collection.

Given a partial dataId specified in dataId and **rest, find all
datasets at a given level specified by a dataId key (e.g. visit or
sensor or amp for a camera) and return a collection of their dataIds
as ButlerDataRefs.

@param datasetType (str)  the type of dataset collection to subset
@param level (str)        the level of dataId at which to subset
@param dataId (dict)      the data id.
@param **rest             keyword arguments for the data id.
@returns (ButlerSubset) collection of ButlerDataRefs for datasets
matching the data id.

Definition at line 334 of file butler.py.

335  def subset(self, datasetType, level=None, dataId={}, **rest):
336  """Extracts a subset of a dataset collection.
337 
338  Given a partial dataId specified in dataId and **rest, find all
339  datasets at a given level specified by a dataId key (e.g. visit or
340  sensor or amp for a camera) and return a collection of their dataIds
341  as ButlerDataRefs.
342 
343  @param datasetType (str) the type of dataset collection to subset
344  @param level (str) the level of dataId at which to subset
345  @param dataId (dict) the data id.
346  @param **rest keyword arguments for the data id.
347  @returns (ButlerSubset) collection of ButlerDataRefs for datasets
348  matching the data id."""
349 
350  if level is None:
351  level = self.mapper.getDefaultLevel()
352  dataId = self._combineDicts(dataId, **rest)
353  return ButlerSubset(self, datasetType, level, dataId)

Member Data Documentation

lsst.daf.persistence.butler.Butler.log

Definition at line 136 of file butler.py.

lsst.daf.persistence.butler.Butler.mapper

Definition at line 128 of file butler.py.

lsst.daf.persistence.butler.Butler.persistence

Definition at line 135 of file butler.py.


The documentation for this class was generated from the following file: