LSST Applications g0da5cf3356+25b44625d0,g17e5ecfddb+50a5ac4092,g1c76d35bf8+585f0f68a2,g295839609d+8ef6456700,g2e2c1a68ba+cc1f6f037e,g38293774b4+62d12e78cb,g3b44f30a73+2891c76795,g48ccf36440+885b902d19,g4b2f1765b6+0c565e8f25,g5320a0a9f6+bd4bf1dc76,g56364267ca+403c24672b,g56b687f8c9+585f0f68a2,g5c4744a4d9+78cd207961,g5ffd174ac0+bd4bf1dc76,g6075d09f38+3075de592a,g667d525e37+cacede5508,g6f3e93b5a3+da81c812ee,g71f27ac40c+cacede5508,g7212e027e3+eb621d73aa,g774830318a+18d2b9fa6c,g7985c39107+62d12e78cb,g79ca90bc5c+fa2cc03294,g881bdbfe6c+cacede5508,g91fc1fa0cf+82a115f028,g961520b1fb+2534687f64,g96f01af41f+f2060f23b6,g9ca82378b8+cacede5508,g9d27549199+78cd207961,gb065e2a02a+ad48cbcda4,gb1df4690d6+585f0f68a2,gb35d6563ee+62d12e78cb,gbc3249ced9+bd4bf1dc76,gbec6a3398f+bd4bf1dc76,gd01420fc67+bd4bf1dc76,gd59336e7c4+c7bb92e648,gf46e8334de+81c9a61069,gfed783d017+bd4bf1dc76,v25.0.1.rc3
LSST Data Management Base Package
Loading...
Searching...
No Matches
Public Member Functions | Static Public Member Functions | Public Attributes | Static Public Attributes | List of all members
lsst.daf.persistence.butler.Butler Class Reference

Public Member Functions

def __init__ (self, root=None, mapper=None, inputs=None, outputs=None, **mapperArgs)
 
def __repr__ (self)
 
def defineAlias (self, alias, datasetType)
 
def getKeys (self, datasetType=None, level=None, tag=None)
 
def getDatasetTypes (self, tag=None)
 
def queryMetadata (self, datasetType, format, dataId={}, **rest)
 
def datasetExists (self, datasetType, dataId={}, write=False, **rest)
 
def get (self, datasetType, dataId=None, immediate=True, **rest)
 
def put (self, obj, datasetType, dataId={}, doBackup=False, **rest)
 
def subset (self, datasetType, level=None, dataId={}, **rest)
 
def dataRef (self, datasetType, level=None, dataId={}, **rest)
 
def getUri (self, datasetType, dataId=None, write=False, **rest)
 
def __reduce__ (self)
 

Static Public Member Functions

def getMapperClass (root)
 

Public Attributes

 log
 
 datasetTypeAliasDict
 
 storage
 

Static Public Attributes

int GENERATION = 2
 

Detailed Description

Butler provides a generic mechanism for persisting and retrieving data using mappers.

A Butler manages a collection of datasets known as a repository. Each dataset has a type representing its
intended usage and a location. Note that the dataset type is not the same as the C++ or Python type of the
object containing the data. For example, an ExposureF object might be used to hold the data for a raw
image, a post-ISR image, a calibrated science image, or a difference image. These would all be different
dataset types.

A Butler can produce a collection of possible values for a key (or tuples of values for multiple keys) if
given a partial data identifier. It can check for the existence of a file containing a dataset given its
type and data identifier. The Butler can then retrieve the dataset. Similarly, it can persist an object to
an appropriate location when given its associated data identifier.

Note that the Butler has two more advanced features when retrieving a data set. First, the retrieval is
lazy. Input does not occur until the data set is actually accessed. This allows datasets to be retrieved
and placed on a clipboard prospectively with little cost, even if the algorithm of a stage ends up not
using them. Second, the Butler will call a standardization hook upon retrieval of the dataset. This
function, contained in the input mapper object, must perform any necessary manipulations to force the
retrieved object to conform to standards, including translating metadata.

Public methods:

__init__(self, root, mapper=None, **mapperArgs)

defineAlias(self, alias, datasetType)

getKeys(self, datasetType=None, level=None)

getDatasetTypes(self)

queryMetadata(self, datasetType, format=None, dataId={}, **rest)

datasetExists(self, datasetType, dataId={}, **rest)

get(self, datasetType, dataId={}, immediate=False, **rest)

put(self, obj, datasetType, dataId={}, **rest)

subset(self, datasetType, level=None, dataId={}, **rest)

dataRef(self, datasetType, level=None, dataId={}, **rest)

Initialization:

The preferred method of initialization is to use the `inputs` and `outputs` __init__ parameters. These
are described in the parameters section, below.

For backward compatibility: this initialization method signature can take a posix root path, and
optionally a mapper class instance or class type that will be instantiated using the mapperArgs input
argument. However, for this to work in a backward compatible way it creates a single repository that is
used as both an input and an output repository. This is NOT preferred, and will likely break any
provenance system we have in place.

Parameters
----------
root : string
    .. note:: Deprecated in 12_0
              `root` will be removed in TBD, it is replaced by `inputs` and `outputs` for
              multiple-repository support.
    A file system path. Will only work with a PosixRepository.
mapper : string or instance
    .. note:: Deprecated in 12_0
              `mapper` will be removed in TBD, it is replaced by `inputs` and `outputs` for
              multiple-repository support.
    Provides a mapper to be used with Butler.
mapperArgs : dict
    .. note:: Deprecated in 12_0
              `mapperArgs` will be removed in TBD, it is replaced by `inputs` and `outputs` for
              multiple-repository support.
    Provides arguments to be passed to the mapper if the mapper input argument is a class type to be
    instantiated by Butler.
inputs : RepositoryArgs, dict, or string
    Can be a single item or a list. Provides arguments to load an existing repository (or repositories).
    String is assumed to be a URI and is used as the cfgRoot (URI to the location of the cfg file). (Local
    file system URI does not have to start with 'file://' and in this way can be a relative path). The
    `RepositoryArgs` class can be used to provide more parameters with which to initialize a repository
    (such as `mapper`, `mapperArgs`, `tags`, etc. See the `RepositoryArgs` documentation for more
    details). A dict may be used as shorthand for a `RepositoryArgs` class instance. The dict keys must
    match parameters to the `RepositoryArgs.__init__` function.
outputs : RepositoryArgs, dict, or string
    Provides arguments to load one or more existing repositories or create new ones. The different types
    are handled the same as for `inputs`.

The Butler init sequence loads all of the input and output repositories.
This creates the object hierarchy to read from and write to them. Each
repository can have 0 or more parents, which also get loaded as inputs.
This becomes a DAG of repositories. Ultimately, Butler creates a list of
these Repositories in the order that they are used.

Initialization Sequence
=======================

During initialization Butler creates a Repository class instance & support structure for each object
passed to `inputs` and `outputs` as well as the parent repositories recorded in the `RepositoryCfg` of
each existing readable repository.

This process is complex. It is explained below to shed some light on the intent of each step.

1. Input Argument Standardization
---------------------------------

In `Butler._processInputArguments` the input arguments are verified to be legal (and a RuntimeError is
raised if not), and they are converted into an expected format that is used for the rest of the Butler
init sequence. See the docstring for `_processInputArguments`.

2. Create RepoData Objects
--------------------------

Butler uses an object, called `RepoData`, to keep track of information about each repository; each
repository is contained in a single `RepoData`. The attributes are explained in its docstring.

After `_processInputArguments`, a RepoData is instantiated and put in a list for each repository in
`outputs` and `inputs`. This list of RepoData, the `repoDataList`, now represents all the output and input
repositories (but not parent repositories) that this Butler instance will use.

3. Get `RepositoryCfg`s
-----------------------

`Butler._getCfgs` gets the `RepositoryCfg` for each repository the `repoDataList`. The behavior is
described in the docstring.

4. Add Parents
--------------

`Butler._addParents` then considers the parents list in the `RepositoryCfg` of each `RepoData` in the
`repoDataList` and inserts new `RepoData` objects for each parent not represented in the proper location
in the `repoDataList`. Ultimately a flat list is built to represent the DAG of readable repositories
represented in depth-first order.

5. Set and Verify Parents of Outputs
------------------------------------

To be able to load parent repositories when output repositories are used as inputs, the input repositories
are recorded as parents in the `RepositoryCfg` file of new output repositories. When an output repository
already exists, for consistency the Butler's inputs must match the list of parents specified the already-
existing output repository's `RepositoryCfg` file.

In `Butler._setAndVerifyParentsLists`, the list of parents is recorded in the `RepositoryCfg` of new
repositories. For existing repositories the list of parents is compared with the `RepositoryCfg`'s parents
list, and if they do not match a `RuntimeError` is raised.

6. Set the Default Mapper
-------------------------

If all the input repositories use the same mapper then we can assume that mapper to be the
"default mapper". If there are new output repositories whose `RepositoryArgs` do not specify a mapper and
there is a default mapper then the new output repository will be set to use that default mapper.

This is handled in `Butler._setDefaultMapper`.

7. Cache References to Parent RepoDatas
---------------------------------------

In `Butler._connectParentRepoDatas`, in each `RepoData` in `repoDataList`, a list of `RepoData` object
references is  built that matches the parents specified in that `RepoData`'s `RepositoryCfg`.

This list is used later to find things in that repository's parents, without considering peer repository's
parents. (e.g. finding the registry of a parent)

8. Set Tags
-----------

Tags are described at https://ldm-463.lsst.io/v/draft/#tagging

In `Butler._setRepoDataTags`, for each `RepoData`, the tags specified by its `RepositoryArgs` are recorded
in a set, and added to the tags set in each of its parents, for ease of lookup when mapping.

9. Find Parent Registry and Instantiate RepoData
------------------------------------------------

At this point there is enough information to instantiate the `Repository` instances. There is one final
step before instantiating the Repository, which is to try to get a parent registry that can be used by the
child repository. The criteria for "can be used" is spelled out in `Butler._setParentRegistry`. However,
to get the registry from the parent, the parent must be instantiated. The `repoDataList`, in depth-first
search order, is built so that the most-dependent repositories are first, and the least dependent
repositories are last. So the `repoDataList` is reversed and the Repositories are instantiated in that
order; for each RepoData a parent registry is searched for, and then the Repository is instantiated with
whatever registry could be found.

Definition at line 323 of file butler.py.

Constructor & Destructor Documentation

◆ __init__()

def lsst.daf.persistence.butler.Butler.__init__ (   self,
  root = None,
  mapper = None,
  inputs = None,
  outputs = None,
**  mapperArgs 
)

Definition at line 507 of file butler.py.

507 def __init__(self, root=None, mapper=None, inputs=None, outputs=None, **mapperArgs):
508 self._initArgs = {'root': root, 'mapper': mapper, 'inputs': inputs, 'outputs': outputs,
509 'mapperArgs': mapperArgs}
510
511 self.log = Log.getLogger("daf.persistence.butler")
512
513 inputs, outputs = self._processInputArguments(
514 root=root, mapper=mapper, inputs=inputs, outputs=outputs, **mapperArgs)
515
516 # convert the RepoArgs into RepoData
517 inputs = [RepoData(args, 'input') for args in inputs]
518 outputs = [RepoData(args, 'output') for args in outputs]
519 repoDataList = outputs + inputs
520
521 self._getCfgs(repoDataList)
522
523 self._addParents(repoDataList)
524
525 self._setAndVerifyParentsLists(repoDataList)
526
527 self._setDefaultMapper(repoDataList)
528
529 self._connectParentRepoDatas(repoDataList)
530
531 self._repos = RepoDataContainer(repoDataList)
532
533 self._setRepoDataTags()
534
535 for repoData in repoDataList:
536 self._initRepo(repoData)
537

Member Function Documentation

◆ __reduce__()

def lsst.daf.persistence.butler.Butler.__reduce__ (   self)

Definition at line 1617 of file butler.py.

1617 def __reduce__(self):
1618 ret = (_unreduce, (self._initArgs, self.datasetTypeAliasDict))
1619 return ret
1620

◆ __repr__()

def lsst.daf.persistence.butler.Butler.__repr__ (   self)

Definition at line 1035 of file butler.py.

1035 def __repr__(self):
1036 return 'Butler(datasetTypeAliasDict=%s, repos=%s)' % (
1037 self.datasetTypeAliasDict, self._repos)
1038

◆ dataRef()

def lsst.daf.persistence.butler.Butler.dataRef (   self,
  datasetType,
  level = None,
  dataId = {},
**  rest 
)
Returns a single ButlerDataRef.

Given a complete dataId specified in dataId and **rest, find the unique dataset at the given level
specified by a dataId key (e.g. visit or sensor or amp for a camera) and return a ButlerDataRef.

Parameters
----------
datasetType - string
    The type of dataset collection to reference
level - string
    The level of dataId at which to reference
dataId - dict
    The data id.
**rest
    Keyword arguments for the data id.

Returns
-------
dataRef - ButlerDataRef
    ButlerDataRef for dataset matching the data id

Definition at line 1502 of file butler.py.

1502 def dataRef(self, datasetType, level=None, dataId={}, **rest):
1503 """Returns a single ButlerDataRef.
1504
1505 Given a complete dataId specified in dataId and **rest, find the unique dataset at the given level
1506 specified by a dataId key (e.g. visit or sensor or amp for a camera) and return a ButlerDataRef.
1507
1508 Parameters
1509 ----------
1510 datasetType - string
1511 The type of dataset collection to reference
1512 level - string
1513 The level of dataId at which to reference
1514 dataId - dict
1515 The data id.
1516 **rest
1517 Keyword arguments for the data id.
1518
1519 Returns
1520 -------
1521 dataRef - ButlerDataRef
1522 ButlerDataRef for dataset matching the data id
1523 """
1524
1525 datasetType = self._resolveDatasetTypeAlias(datasetType)
1526 dataId = DataId(dataId)
1527 subset = self.subset(datasetType, level, dataId, **rest)
1528 if len(subset) != 1:
1529 raise RuntimeError("No unique dataset for: Dataset type:%s Level:%s Data ID:%s Keywords:%s" %
1530 (str(datasetType), str(level), str(dataId), str(rest)))
1531 return ButlerDataRef(subset, subset.cache[0])
1532

◆ datasetExists()

def lsst.daf.persistence.butler.Butler.datasetExists (   self,
  datasetType,
  dataId = {},
  write = False,
**  rest 
)
Determines if a dataset file exists.

Parameters
----------
datasetType - string
    The type of dataset to inquire about.
dataId - DataId, dict
    The data id of the dataset.
write - bool
    If True, look only in locations where the dataset could be written,
    and return True only if it is present in all of them.
**rest keyword arguments for the data id.

Returns
-------
exists - bool
    True if the dataset exists or is non-file-based.

Definition at line 1239 of file butler.py.

1239 def datasetExists(self, datasetType, dataId={}, write=False, **rest):
1240 """Determines if a dataset file exists.
1241
1242 Parameters
1243 ----------
1244 datasetType - string
1245 The type of dataset to inquire about.
1246 dataId - DataId, dict
1247 The data id of the dataset.
1248 write - bool
1249 If True, look only in locations where the dataset could be written,
1250 and return True only if it is present in all of them.
1251 **rest keyword arguments for the data id.
1252
1253 Returns
1254 -------
1255 exists - bool
1256 True if the dataset exists or is non-file-based.
1257 """
1258 datasetType = self._resolveDatasetTypeAlias(datasetType)
1259 dataId = DataId(dataId)
1260 dataId.update(**rest)
1261 locations = self._locate(datasetType, dataId, write=write)
1262 if not write: # when write=False, locations is not a sequence
1263 if locations is None:
1264 return False
1265 locations = [locations]
1266
1267 if not locations: # empty list
1268 return False
1269
1270 for location in locations:
1271 # If the location is a ButlerComposite (as opposed to a ButlerLocation),
1272 # verify the component objects exist.
1273 if isinstance(location, ButlerComposite):
1274 for name, componentInfo in location.componentInfo.items():
1275 if componentInfo.subset:
1276 subset = self.subset(datasetType=componentInfo.datasetType, dataId=location.dataId)
1277 exists = all([obj.datasetExists() for obj in subset])
1278 else:
1279 exists = self.datasetExists(componentInfo.datasetType, location.dataId)
1280 if exists is False:
1281 return False
1282 else:
1283 if not location.repository.exists(location):
1284 return False
1285 return True
1286

◆ defineAlias()

def lsst.daf.persistence.butler.Butler.defineAlias (   self,
  alias,
  datasetType 
)
Register an alias that will be substituted in datasetTypes.

Parameters
----------
alias - string
    The alias keyword. It may start with @ or not. It may not contain @ except as the first character.
datasetType - string
    The string that will be substituted when @alias is passed into datasetType. It may not contain '@'

Definition at line 1105 of file butler.py.

1105 def defineAlias(self, alias, datasetType):
1106 """Register an alias that will be substituted in datasetTypes.
1107
1108 Parameters
1109 ----------
1110 alias - string
1111 The alias keyword. It may start with @ or not. It may not contain @ except as the first character.
1112 datasetType - string
1113 The string that will be substituted when @alias is passed into datasetType. It may not contain '@'
1114 """
1115 # verify formatting of alias:
1116 # it can have '@' as the first character (if not it's okay, we will add it) or not at all.
1117 atLoc = alias.rfind('@')
1118 if atLoc == -1:
1119 alias = "@" + str(alias)
1120 elif atLoc > 0:
1121 raise RuntimeError("Badly formatted alias string: %s" % (alias,))
1122
1123 # verify that datasetType does not contain '@'
1124 if datasetType.count('@') != 0:
1125 raise RuntimeError("Badly formatted type string: %s" % (datasetType))
1126
1127 # verify that the alias keyword does not start with another alias keyword,
1128 # and vice versa
1129 for key in self.datasetTypeAliasDict:
1130 if key.startswith(alias) or alias.startswith(key):
1131 raise RuntimeError("Alias: %s overlaps with existing alias: %s" % (alias, key))
1132
1133 self.datasetTypeAliasDict[alias] = datasetType
1134

◆ get()

def lsst.daf.persistence.butler.Butler.get (   self,
  datasetType,
  dataId = None,
  immediate = True,
**  rest 
)
Retrieves a dataset given an input collection data id.

Parameters
----------
datasetType - string
    The type of dataset to retrieve.
dataId - dict
    The data id.
immediate - bool
    If False use a proxy for delayed loading.
**rest
    keyword arguments for the data id.

Returns
-------
    An object retrieved from the dataset (or a proxy for one).

Definition at line 1377 of file butler.py.

1377 def get(self, datasetType, dataId=None, immediate=True, **rest):
1378 """Retrieves a dataset given an input collection data id.
1379
1380 Parameters
1381 ----------
1382 datasetType - string
1383 The type of dataset to retrieve.
1384 dataId - dict
1385 The data id.
1386 immediate - bool
1387 If False use a proxy for delayed loading.
1388 **rest
1389 keyword arguments for the data id.
1390
1391 Returns
1392 -------
1393 An object retrieved from the dataset (or a proxy for one).
1394 """
1395 datasetType = self._resolveDatasetTypeAlias(datasetType)
1396 dataId = DataId(dataId)
1397 dataId.update(**rest)
1398
1399 location = self._locate(datasetType, dataId, write=False)
1400 if location is None:
1401 raise NoResults("No locations for get:", datasetType, dataId)
1402 self.log.debug("Get type=%s keys=%s from %s", datasetType, dataId, str(location))
1403
1404 if hasattr(location, 'bypass'):
1405 # this type loader block should get moved into a helper someplace, and duplications removed.
1406 def callback():
1407 return location.bypass
1408 else:
1409 def callback():
1410 return self._read(location)
1411 if location.mapper.canStandardize(location.datasetType):
1412 innerCallback = callback
1413
1414 def callback():
1415 return location.mapper.standardize(location.datasetType, innerCallback(), dataId)
1416 if immediate:
1417 return callback()
1418 return ReadProxy(callback)
1419

◆ getDatasetTypes()

def lsst.daf.persistence.butler.Butler.getDatasetTypes (   self,
  tag = None 
)
Get the valid dataset types for all known repos or those matching
the tags.

Parameters
----------
tag - any, or list of any
    If tag is specified then the repo will only be used if the tag
    or a tag in the list matches a tag used for that repository.

Returns
-------
Returns the dataset types as a set of strings.

Definition at line 1170 of file butler.py.

1170 def getDatasetTypes(self, tag=None):
1171 """Get the valid dataset types for all known repos or those matching
1172 the tags.
1173
1174 Parameters
1175 ----------
1176 tag - any, or list of any
1177 If tag is specified then the repo will only be used if the tag
1178 or a tag in the list matches a tag used for that repository.
1179
1180 Returns
1181 -------
1182 Returns the dataset types as a set of strings.
1183 """
1184 datasetTypes = set()
1185 tag = setify(tag)
1186 for repoData in self._repos.outputs() + self._repos.inputs():
1187 if not tag or len(tag.intersection(repoData.tags)) > 0:
1188 datasetTypes = datasetTypes.union(
1189 repoData.repo.mappers()[0].getDatasetTypes())
1190 return datasetTypes
1191
daf::base::PropertySet * set
Definition: fits.cc:927

◆ getKeys()

def lsst.daf.persistence.butler.Butler.getKeys (   self,
  datasetType = None,
  level = None,
  tag = None 
)
Get the valid data id keys at or above the given level of hierarchy for the dataset type or the
entire collection if None. The dict values are the basic Python types corresponding to the keys (int,
float, string).

Parameters
----------
datasetType - string
    The type of dataset to get keys for, entire collection if None.
level - string
    The hierarchy level to descend to. None if it should not be restricted. Use an empty string if the
    mapper should lookup the default level.
tags - any, or list of any
    If tag is specified then the repo will only be used if the tag
    or a tag in the list matches a tag used for that repository.

Returns
-------
Returns a dict. The dict keys are the valid data id keys at or above the given level of hierarchy for
the dataset type or the entire collection if None. The dict values are the basic Python types
corresponding to the keys (int, float, string).

Definition at line 1135 of file butler.py.

1135 def getKeys(self, datasetType=None, level=None, tag=None):
1136 """Get the valid data id keys at or above the given level of hierarchy for the dataset type or the
1137 entire collection if None. The dict values are the basic Python types corresponding to the keys (int,
1138 float, string).
1139
1140 Parameters
1141 ----------
1142 datasetType - string
1143 The type of dataset to get keys for, entire collection if None.
1144 level - string
1145 The hierarchy level to descend to. None if it should not be restricted. Use an empty string if the
1146 mapper should lookup the default level.
1147 tags - any, or list of any
1148 If tag is specified then the repo will only be used if the tag
1149 or a tag in the list matches a tag used for that repository.
1150
1151 Returns
1152 -------
1153 Returns a dict. The dict keys are the valid data id keys at or above the given level of hierarchy for
1154 the dataset type or the entire collection if None. The dict values are the basic Python types
1155 corresponding to the keys (int, float, string).
1156 """
1157 datasetType = self._resolveDatasetTypeAlias(datasetType)
1158
1159 keys = None
1160 tag = setify(tag)
1161 for repoData in self._repos.inputs():
1162 if not tag or len(tag.intersection(repoData.tags)) > 0:
1163 keys = repoData.repo.getKeys(datasetType, level)
1164 # An empty dict is a valid "found" condition for keys. The only value for keys that should
1165 # cause the search to continue is None
1166 if keys is not None:
1167 break
1168 return keys
1169

◆ getMapperClass()

def lsst.daf.persistence.butler.Butler.getMapperClass (   root)
static
posix-only; gets the mapper class at the path specified by root (if a file _mapper can be found at
that location or in a parent location.

As we abstract the storage and support different types of storage locations this method will be
moved entirely into Butler Access, or made more dynamic, and the API will very likely change.

Definition at line 1097 of file butler.py.

1097 def getMapperClass(root):
1098 """posix-only; gets the mapper class at the path specified by root (if a file _mapper can be found at
1099 that location or in a parent location.
1100
1101 As we abstract the storage and support different types of storage locations this method will be
1102 moved entirely into Butler Access, or made more dynamic, and the API will very likely change."""
1103 return Storage.getMapperClass(root)
1104

◆ getUri()

def lsst.daf.persistence.butler.Butler.getUri (   self,
  datasetType,
  dataId = None,
  write = False,
**  rest 
)
Return the URI for a dataset

.. warning:: This is intended only for debugging. The URI should
never be used for anything other than printing.

.. note:: In the event there are multiple URIs for read, we return only
the first.

.. note:: getUri() does not currently support composite datasets.

Parameters
----------
datasetType : `str`
   The dataset type of interest.
dataId : `dict`, optional
   The data identifier.
write : `bool`, optional
   Return the URI for writing?
rest : `dict`, optional
   Keyword arguments for the data id.

Returns
-------
uri : `str`
   URI for dataset.

Definition at line 1533 of file butler.py.

1533 def getUri(self, datasetType, dataId=None, write=False, **rest):
1534 """Return the URI for a dataset
1535
1536 .. warning:: This is intended only for debugging. The URI should
1537 never be used for anything other than printing.
1538
1539 .. note:: In the event there are multiple URIs for read, we return only
1540 the first.
1541
1542 .. note:: getUri() does not currently support composite datasets.
1543
1544 Parameters
1545 ----------
1546 datasetType : `str`
1547 The dataset type of interest.
1548 dataId : `dict`, optional
1549 The data identifier.
1550 write : `bool`, optional
1551 Return the URI for writing?
1552 rest : `dict`, optional
1553 Keyword arguments for the data id.
1554
1555 Returns
1556 -------
1557 uri : `str`
1558 URI for dataset.
1559 """
1560 datasetType = self._resolveDatasetTypeAlias(datasetType)
1561 dataId = DataId(dataId)
1562 dataId.update(**rest)
1563 locations = self._locate(datasetType, dataId, write=write)
1564 if locations is None:
1565 raise NoResults("No locations for getUri: ", datasetType, dataId)
1566
1567 if write:
1568 # Follow the write path
1569 # Return the first valid write location.
1570 for location in locations:
1571 if isinstance(location, ButlerComposite):
1572 for name, info in location.componentInfo.items():
1573 if not info.inputOnly:
1574 return self.getUri(info.datasetType, location.dataId, write=True)
1575 else:
1576 return location.getLocationsWithRoot()[0]
1577 # fall back to raise
1578 raise NoResults("No locations for getUri(write=True): ", datasetType, dataId)
1579 else:
1580 # Follow the read path, only return the first valid read
1581 return locations.getLocationsWithRoot()[0]
1582

◆ put()

def lsst.daf.persistence.butler.Butler.put (   self,
  obj,
  datasetType,
  dataId = {},
  doBackup = False,
**  rest 
)
Persists a dataset given an output collection data id.

Parameters
----------
obj -
    The object to persist.
datasetType - string
    The type of dataset to persist.
dataId - dict
    The data id.
doBackup - bool
    If True, rename existing instead of overwriting.
    WARNING: Setting doBackup=True is not safe for parallel processing, as it may be subject to race
    conditions.
**rest
    Keyword arguments for the data id.

Definition at line 1420 of file butler.py.

1420 def put(self, obj, datasetType, dataId={}, doBackup=False, **rest):
1421 """Persists a dataset given an output collection data id.
1422
1423 Parameters
1424 ----------
1425 obj -
1426 The object to persist.
1427 datasetType - string
1428 The type of dataset to persist.
1429 dataId - dict
1430 The data id.
1431 doBackup - bool
1432 If True, rename existing instead of overwriting.
1433 WARNING: Setting doBackup=True is not safe for parallel processing, as it may be subject to race
1434 conditions.
1435 **rest
1436 Keyword arguments for the data id.
1437 """
1438 datasetType = self._resolveDatasetTypeAlias(datasetType)
1439 dataId = DataId(dataId)
1440 dataId.update(**rest)
1441
1442 locations = self._locate(datasetType, dataId, write=True)
1443 if not locations:
1444 raise NoResults("No locations for put:", datasetType, dataId)
1445 for location in locations:
1446 if isinstance(location, ButlerComposite):
1447 disassembler = location.disassembler if location.disassembler else genericDisassembler
1448 disassembler(obj=obj, dataId=location.dataId, componentInfo=location.componentInfo)
1449 for name, info in location.componentInfo.items():
1450 if not info.inputOnly:
1451 self.put(info.obj, info.datasetType, location.dataId, doBackup=doBackup)
1452 else:
1453 if doBackup:
1454 location.getRepository().backup(location.datasetType, dataId)
1455 location.getRepository().write(location, obj)
1456

◆ queryMetadata()

def lsst.daf.persistence.butler.Butler.queryMetadata (   self,
  datasetType,
  format,
  dataId = {},
**  rest 
)
Returns the valid values for one or more keys when given a partial
input collection data id.

Parameters
----------
datasetType - string
    The type of dataset to inquire about.
format - str, tuple
    Key or tuple of keys to be returned.
dataId - DataId, dict
    The partial data id.
**rest -
    Keyword arguments for the partial data id.

Returns
-------
A list of valid values or tuples of valid values as specified by the
format.

Definition at line 1192 of file butler.py.

1192 def queryMetadata(self, datasetType, format, dataId={}, **rest):
1193 """Returns the valid values for one or more keys when given a partial
1194 input collection data id.
1195
1196 Parameters
1197 ----------
1198 datasetType - string
1199 The type of dataset to inquire about.
1200 format - str, tuple
1201 Key or tuple of keys to be returned.
1202 dataId - DataId, dict
1203 The partial data id.
1204 **rest -
1205 Keyword arguments for the partial data id.
1206
1207 Returns
1208 -------
1209 A list of valid values or tuples of valid values as specified by the
1210 format.
1211 """
1212
1213 datasetType = self._resolveDatasetTypeAlias(datasetType)
1214 dataId = DataId(dataId)
1215 dataId.update(**rest)
1216 format = sequencify(format)
1217
1218 tuples = None
1219 for repoData in self._repos.inputs():
1220 if not dataId.tag or len(dataId.tag.intersection(repoData.tags)) > 0:
1221 tuples = repoData.repo.queryMetadata(datasetType, format, dataId)
1222 if tuples:
1223 break
1224
1225 if not tuples:
1226 return []
1227
1228 if len(format) == 1:
1229 ret = []
1230 for x in tuples:
1231 try:
1232 ret.append(x[0])
1233 except TypeError:
1234 ret.append(x)
1235 return ret
1236
1237 return tuples
1238

◆ subset()

def lsst.daf.persistence.butler.Butler.subset (   self,
  datasetType,
  level = None,
  dataId = {},
**  rest 
)
Return complete dataIds for a dataset type that match a partial (or empty) dataId.

Given a partial (or empty) dataId specified in dataId and **rest, find all datasets that match the
dataId.  Optionally restrict the results to a given level specified by a dataId key (e.g. visit or
sensor or amp for a camera).  Return an iterable collection of complete dataIds as ButlerDataRefs.
Datasets with the resulting dataIds may not exist; that needs to be tested with datasetExists().

Parameters
----------
datasetType - string
    The type of dataset collection to subset
level - string
    The level of dataId at which to subset. Use an empty string if the mapper should look up the
    default level.
dataId - dict
    The data id.
**rest
    Keyword arguments for the data id.

Returns
-------
subset - ButlerSubset
    Collection of ButlerDataRefs for datasets matching the data id.

Examples
-----------
To print the full dataIds for all r-band measurements in a source catalog
(note that the subset call is equivalent to: `butler.subset('src', dataId={'filter':'r'})`):

>>> subset = butler.subset('src', filter='r')
>>> for data_ref in subset: print(data_ref.dataId)

Definition at line 1457 of file butler.py.

1457 def subset(self, datasetType, level=None, dataId={}, **rest):
1458 """Return complete dataIds for a dataset type that match a partial (or empty) dataId.
1459
1460 Given a partial (or empty) dataId specified in dataId and **rest, find all datasets that match the
1461 dataId. Optionally restrict the results to a given level specified by a dataId key (e.g. visit or
1462 sensor or amp for a camera). Return an iterable collection of complete dataIds as ButlerDataRefs.
1463 Datasets with the resulting dataIds may not exist; that needs to be tested with datasetExists().
1464
1465 Parameters
1466 ----------
1467 datasetType - string
1468 The type of dataset collection to subset
1469 level - string
1470 The level of dataId at which to subset. Use an empty string if the mapper should look up the
1471 default level.
1472 dataId - dict
1473 The data id.
1474 **rest
1475 Keyword arguments for the data id.
1476
1477 Returns
1478 -------
1479 subset - ButlerSubset
1480 Collection of ButlerDataRefs for datasets matching the data id.
1481
1482 Examples
1483 -----------
1484 To print the full dataIds for all r-band measurements in a source catalog
1485 (note that the subset call is equivalent to: `butler.subset('src', dataId={'filter':'r'})`):
1486
1487 >>> subset = butler.subset('src', filter='r')
1488 >>> for data_ref in subset: print(data_ref.dataId)
1489 """
1490 datasetType = self._resolveDatasetTypeAlias(datasetType)
1491
1492 # Currently expected behavior of subset is that if specified level is None then the mapper's default
1493 # level should be used. Convention for level within Butler is that an empty string is used to indicate
1494 # 'get default'.
1495 if level is None:
1496 level = ''
1497
1498 dataId = DataId(dataId)
1499 dataId.update(**rest)
1500 return ButlerSubset(self, datasetType, level, dataId)
1501

Member Data Documentation

◆ datasetTypeAliasDict

lsst.daf.persistence.butler.Butler.datasetTypeAliasDict

Definition at line 601 of file butler.py.

◆ GENERATION

int lsst.daf.persistence.butler.Butler.GENERATION = 2
static

Definition at line 503 of file butler.py.

◆ log

lsst.daf.persistence.butler.Butler.log

Definition at line 511 of file butler.py.

◆ storage

lsst.daf.persistence.butler.Butler.storage

Definition at line 603 of file butler.py.


The documentation for this class was generated from the following file: