LSST Applications  21.0.0+75b29a8a7f,21.0.0+e70536a077,21.0.0-1-ga51b5d4+62c747d40b,21.0.0-10-gbfb87ad6+3307648ee3,21.0.0-15-gedb9d5423+47cba9fc36,21.0.0-2-g103fe59+fdf0863a2a,21.0.0-2-g1367e85+d38a93257c,21.0.0-2-g45278ab+e70536a077,21.0.0-2-g5242d73+d38a93257c,21.0.0-2-g7f82c8f+e682ffb718,21.0.0-2-g8dde007+d179fbfa6a,21.0.0-2-g8f08a60+9402881886,21.0.0-2-ga326454+e682ffb718,21.0.0-2-ga63a54e+08647d4b1b,21.0.0-2-gde069b7+26c92b3210,21.0.0-2-gecfae73+0445ed2f95,21.0.0-2-gfc62afb+d38a93257c,21.0.0-27-gbbd0d29+ae871e0f33,21.0.0-28-g5fc5e037+feb0e9397b,21.0.0-3-g21c7a62+f4b9c0ff5c,21.0.0-3-g357aad2+57b0bddf0b,21.0.0-3-g4be5c26+d38a93257c,21.0.0-3-g65f322c+3f454acf5d,21.0.0-3-g7d9da8d+75b29a8a7f,21.0.0-3-gaa929c8+9e4ef6332c,21.0.0-3-ge02ed75+4b120a55c4,21.0.0-4-g3300ddd+e70536a077,21.0.0-4-g591bb35+4b120a55c4,21.0.0-4-gc004bbf+4911b9cd27,21.0.0-4-gccdca77+f94adcd104,21.0.0-4-ge8fba5a+2b3a696ff9,21.0.0-5-gb155db7+2c5429117a,21.0.0-5-gdf36809+637e4641ee,21.0.0-6-g00874e7+c9fd7f7160,21.0.0-6-g4e60332+4b120a55c4,21.0.0-7-gc8ca178+40eb9cf840,21.0.0-8-gfbe0b4b+9e4ef6332c,21.0.0-9-g2fd488a+d83b7cd606,w.2021.05
LSST Data Management Base Package
Public Member Functions | Public Attributes | List of all members
lsst.obs.base.gen2to3.calibRepoConverter.CalibRepoConverter Class Reference
Inheritance diagram for lsst.obs.base.gen2to3.calibRepoConverter.CalibRepoConverter:
lsst.obs.base.gen2to3.repoConverter.RepoConverter

Public Member Functions

def __init__ (self, *CameraMapper mapper, Sequence[str] labels=(), **kwargs)
 
bool isDatasetTypeSpecial (self, str datasetTypeName)
 
Iterator[Tuple[str, CameraMapperMapping]] iterMappings (self)
 
RepoWalker.Target makeRepoWalkerTarget (self, str datasetTypeName, str template, Dict[str, type] keys, StorageClass storageClass, FormatterParameter formatter=None, Optional[PathElementHandler] targetHandler=None)
 
str getRun (self, str datasetTypeName, Optional[str] calibDate=None)
 
List[str] getSpecialDirectories (self)
 
def prep (self)
 
Iterator[FileDataset] iterDatasets (self)
 
def findDatasets (self)
 
def expandDataIds (self)
 
def ingest (self)
 
None finish (self)
 

Public Attributes

 mapper
 
 collection
 
 task
 
 root
 
 instrument
 
 subset
 

Detailed Description

A specialization of `RepoConverter` for calibration repositories.

Parameters
----------
mapper : `CameraMapper`
    Gen2 mapper for the data repository.  The root associated with the
    mapper is ignored and need not match the root of the repository.
labels : `Sequence` [ `str` ]
    Strings injected into the names of the collections that calibration
    datasets are written and certified into (forwarded as the ``extra``
    argument to `Instrument` methods that generate collection names and
    write curated calibrations).
**kwargs
    Additional keyword arguments are forwarded to (and required by)
    `RepoConverter`.

Definition at line 44 of file calibRepoConverter.py.

Constructor & Destructor Documentation

◆ __init__()

def lsst.obs.base.gen2to3.calibRepoConverter.CalibRepoConverter.__init__ (   self,
*CameraMapper  mapper,
Sequence[str]   labels = (),
**  kwargs 
)

Definition at line 62 of file calibRepoConverter.py.

62  def __init__(self, *, mapper: CameraMapper, labels: Sequence[str] = (), **kwargs):
63  super().__init__(run=None, **kwargs)
64  self.mapper = mapper
65  self.collection = self.task.instrument.makeCalibrationCollectionName(*labels)
66  self._labels = tuple(labels)
67  self._datasetTypes = set()
68 
daf::base::PropertySet * set
Definition: fits.cc:912

Member Function Documentation

◆ expandDataIds()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.expandDataIds (   self)
inherited
Expand the data IDs for all datasets to be inserted.

Subclasses may override this method, but must delegate to the base
class implementation if they do.

This involves queries to the registry, but not writes.  It is
guaranteed to be called between `findDatasets` and `ingest`.

Definition at line 441 of file repoConverter.py.

441  def expandDataIds(self):
442  """Expand the data IDs for all datasets to be inserted.
443 
444  Subclasses may override this method, but must delegate to the base
445  class implementation if they do.
446 
447  This involves queries to the registry, but not writes. It is
448  guaranteed to be called between `findDatasets` and `ingest`.
449  """
450  import itertools
451  for datasetType, datasetsByCalibDate in self._fileDatasets.items():
452  for calibDate, datasetsForCalibDate in datasetsByCalibDate.items():
453  nDatasets = len(datasetsForCalibDate)
454  suffix = "" if nDatasets == 1 else "s"
455  if calibDate is not None:
456  self.task.log.info("Expanding data IDs for %s %s dataset%s at calibDate %s.",
457  nDatasets,
458  datasetType.name,
459  suffix,
460  calibDate)
461  else:
462  self.task.log.info("Expanding data IDs for %s %s non-calibration dataset%s.",
463  nDatasets,
464  datasetType.name,
465  suffix)
466  expanded = []
467  for dataset in datasetsForCalibDate:
468  for i, ref in enumerate(dataset.refs):
469  self.task.log.debug("Expanding data ID %s.", ref.dataId)
470  try:
471  dataId = self.task.registry.expandDataId(ref.dataId)
472  dataset.refs[i] = ref.expanded(dataId)
473  except LookupError as err:
474  self.task.log.warn("Skipping ingestion for '%s': %s", dataset.path, err)
475  # Remove skipped datasets from multi-extension
476  # FileDatasets
477  dataset.refs[i] = None # We will strip off the `None`s after the loop.
478  dataset.refs[:] = itertools.filterfalse(lambda x: x is None, dataset.refs)
479  if dataset.refs:
480  expanded.append(dataset)
481  datasetsForCalibDate[:] = expanded
482 
std::vector< SchemaItem< Flag > > * items

◆ findDatasets()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.findDatasets (   self)
inherited

Definition at line 424 of file repoConverter.py.

424  def findDatasets(self):
425  assert self._repoWalker, "prep() must be called before findDatasets."
426  self.task.log.info("Adding special datasets in repo %s.", self.root)
427  for dataset in self.iterDatasets():
428  assert len(dataset.refs) == 1
429  # None index below is for calibDate, which is only relevant for
430  # CalibRepoConverter.
431  self._fileDatasets[dataset.refs[0].datasetType][None].append(dataset)
432  self.task.log.info("Finding datasets from files in repo %s.", self.root)
433  datasetsByTypeAndCalibDate = self._repoWalker.walk(
434  self.root,
435  predicate=(self.subset.isRelated if self.subset is not None else None)
436  )
437  for datasetType, datasetsByCalibDate in datasetsByTypeAndCalibDate.items():
438  for calibDate, datasets in datasetsByCalibDate.items():
439  self._fileDatasets[datasetType][calibDate].extend(datasets)
440 
std::shared_ptr< FrameSet > append(FrameSet const &first, FrameSet const &second)
Construct a FrameSet that performs two transformations in series.
Definition: functional.cc:33

◆ finish()

None lsst.obs.base.gen2to3.repoConverter.RepoConverter.finish (   self)
inherited
Finish conversion of a repository.

This is run after ``ingest``, and delegates to `_finish`, which should
be overridden by derived classes instead of this method.

Definition at line 511 of file repoConverter.py.

511  def finish(self) -> None:
512  """Finish conversion of a repository.
513 
514  This is run after ``ingest``, and delegates to `_finish`, which should
515  be overridden by derived classes instead of this method.
516  """
517  self._finish(self._fileDatasets)
518 

◆ getRun()

str lsst.obs.base.gen2to3.calibRepoConverter.CalibRepoConverter.getRun (   self,
str  datasetTypeName,
Optional[str]   calibDate = None 
)
Return the name of the run to insert instances of the given dataset
type into in this collection.

Parameters
----------
datasetTypeName : `str`
    Name of the dataset type.
calibDate : `str`, optional
    If not `None`, the "CALIBDATE" associated with this (calibration)
    dataset in the Gen2 data repository.

Returns
-------
run : `str`
    Name of the `~lsst.daf.butler.CollectionType.RUN` collection.

Reimplemented from lsst.obs.base.gen2to3.repoConverter.RepoConverter.

Definition at line 294 of file calibRepoConverter.py.

294  def getRun(self, datasetTypeName: str, calibDate: Optional[str] = None) -> str:
295  # Docstring inherited from RepoConverter.
296  if calibDate is None:
297  return super().getRun(datasetTypeName)
298  else:
299  return self.instrument.makeCalibrationCollectionName(
300  *self._labels,
301  self.instrument.formatCollectionTimestamp(calibDate),
302  )
303 

◆ getSpecialDirectories()

List[str] lsst.obs.base.gen2to3.repoConverter.RepoConverter.getSpecialDirectories (   self)
inherited
Return a list of directory paths that should not be searched for
files.

These may be directories that simply do not contain datasets (or
contain datasets in another repository), or directories whose datasets
are handled specially by a subclass.

Returns
-------
directories : `list` [`str`]
    The full paths of directories to skip, relative to the repository
    root.

Reimplemented in lsst.obs.base.gen2to3.rootRepoConverter.RootRepoConverter.

Definition at line 292 of file repoConverter.py.

292  def getSpecialDirectories(self) -> List[str]:
293  """Return a list of directory paths that should not be searched for
294  files.
295 
296  These may be directories that simply do not contain datasets (or
297  contain datasets in another repository), or directories whose datasets
298  are handled specially by a subclass.
299 
300  Returns
301  -------
302  directories : `list` [`str`]
303  The full paths of directories to skip, relative to the repository
304  root.
305  """
306  return []
307 

◆ ingest()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.ingest (   self)
inherited
Insert converted datasets into the Gen3 repository.

Subclasses may override this method, but must delegate to the base
class implementation at some point in their own logic.

This method is guaranteed to be called after `expandDataIds`.

Definition at line 483 of file repoConverter.py.

483  def ingest(self):
484  """Insert converted datasets into the Gen3 repository.
485 
486  Subclasses may override this method, but must delegate to the base
487  class implementation at some point in their own logic.
488 
489  This method is guaranteed to be called after `expandDataIds`.
490  """
491  for datasetType, datasetsByCalibDate in self._fileDatasets.items():
492  self.task.registry.registerDatasetType(datasetType)
493  for calibDate, datasetsForCalibDate in datasetsByCalibDate.items():
494  try:
495  run = self.getRun(datasetType.name, calibDate)
496  except LookupError:
497  self.task.log.warn(f"No run configured for dataset type {datasetType.name}.")
498  continue
499  nDatasets = len(datasetsForCalibDate)
500  self.task.log.info("Ingesting %s %s dataset%s into run %s.", nDatasets,
501  datasetType.name, "" if nDatasets == 1 else "s", run)
502  try:
503  self.task.registry.registerRun(run)
504  self.task.butler3.ingest(*datasetsForCalibDate, transfer=self.task.config.transfer,
505  run=run)
506  except LookupError as err:
507  raise LookupError(
508  f"Error expanding data ID for dataset type {datasetType.name}."
509  ) from err
510 

◆ isDatasetTypeSpecial()

bool lsst.obs.base.gen2to3.calibRepoConverter.CalibRepoConverter.isDatasetTypeSpecial (   self,
str  datasetTypeName 
)
Test whether the given dataset is handled specially by this
converter and hence should be ignored by generic base-class logic that
searches for dataset types to convert.

Parameters
----------
datasetTypeName : `str`
    Name of the dataset type to test.

Returns
-------
special : `bool`
    `True` if the dataset type is special.

Reimplemented from lsst.obs.base.gen2to3.repoConverter.RepoConverter.

Definition at line 69 of file calibRepoConverter.py.

69  def isDatasetTypeSpecial(self, datasetTypeName: str) -> bool:
70  # Docstring inherited from RepoConverter.
71  return datasetTypeName in self.instrument.getCuratedCalibrationNames()
72 

◆ iterDatasets()

Iterator[FileDataset] lsst.obs.base.gen2to3.repoConverter.RepoConverter.iterDatasets (   self)
inherited
Iterate over datasets in the repository that should be ingested into
the Gen3 repository.

The base class implementation yields nothing; the datasets handled by
the `RepoConverter` base class itself are read directly in
`findDatasets`.

Subclasses should override this method if they support additional
datasets that are handled some other way.

Yields
------
dataset : `FileDataset`
    Structures representing datasets to be ingested.  Paths should be
    absolute.

Reimplemented in lsst.obs.base.gen2to3.standardRepoConverter.StandardRepoConverter, and lsst.obs.base.gen2to3.rootRepoConverter.RootRepoConverter.

Definition at line 405 of file repoConverter.py.

405  def iterDatasets(self) -> Iterator[FileDataset]:
406  """Iterate over datasets in the repository that should be ingested into
407  the Gen3 repository.
408 
409  The base class implementation yields nothing; the datasets handled by
410  the `RepoConverter` base class itself are read directly in
411  `findDatasets`.
412 
413  Subclasses should override this method if they support additional
414  datasets that are handled some other way.
415 
416  Yields
417  ------
418  dataset : `FileDataset`
419  Structures representing datasets to be ingested. Paths should be
420  absolute.
421  """
422  yield from ()
423 

◆ iterMappings()

Iterator[Tuple[str, CameraMapperMapping]] lsst.obs.base.gen2to3.calibRepoConverter.CalibRepoConverter.iterMappings (   self)
Iterate over all `CameraMapper` `Mapping` objects that should be
considered for conversion by this repository.

This this should include any datasets that may appear in the
repository, including those that are special (see
`isDatasetTypeSpecial`) and those that are being ignored (see
`ConvertRepoTask.isDatasetTypeIncluded`); this allows the converter
to identify and hence skip these datasets quietly instead of warning
about them as unrecognized.

Yields
------
datasetTypeName: `str`
    Name of the dataset type.
mapping : `lsst.obs.base.mapping.Mapping`
    Mapping object used by the Gen2 `CameraMapper` to describe the
    dataset type.

Reimplemented from lsst.obs.base.gen2to3.repoConverter.RepoConverter.

Definition at line 73 of file calibRepoConverter.py.

73  def iterMappings(self) -> Iterator[Tuple[str, CameraMapperMapping]]:
74  # Docstring inherited from RepoConverter.
75  yield from self.mapper.calibrations.items()
76 

◆ makeRepoWalkerTarget()

RepoWalker.Target lsst.obs.base.gen2to3.calibRepoConverter.CalibRepoConverter.makeRepoWalkerTarget (   self,
str  datasetTypeName,
str  template,
Dict[str, type keys,
StorageClass  storageClass,
FormatterParameter   formatter = None,
Optional[PathElementHandler]   targetHandler = None 
)
Make a struct that identifies a dataset type to be extracted by
walking the repo directory structure.

Parameters
----------
datasetTypeName : `str`
    Name of the dataset type (the same in both Gen2 and Gen3).
template : `str`
    The full Gen2 filename template.
keys : `dict` [`str`, `type`]
    A dictionary mapping Gen2 data ID key to the type of its value.
storageClass : `lsst.daf.butler.StorageClass`
    Gen3 storage class for this dataset type.
formatter : `lsst.daf.butler.Formatter` or `str`, optional
    A Gen 3 formatter class or fully-qualified name.
targetHandler : `PathElementHandler`, optional
    Specialist target handler to use for this dataset type.

Returns
-------
target : `RepoWalker.Target`
    A struct containing information about the target dataset (much of
    it simplify forwarded from the arguments).

Reimplemented from lsst.obs.base.gen2to3.repoConverter.RepoConverter.

Definition at line 77 of file calibRepoConverter.py.

80  ) -> RepoWalker.Target:
81  # Docstring inherited from RepoConverter.
82  target = RepoWalker.Target(
83  datasetTypeName=datasetTypeName,
84  storageClass=storageClass,
85  template=template,
86  keys=keys,
87  instrument=self.task.instrument.getName(),
88  universe=self.task.registry.dimensions,
89  formatter=formatter,
90  targetHandler=targetHandler,
91  translatorFactory=self.task.translatorFactory,
92  )
93  self._datasetTypes.add(target.datasetType)
94  return target
95 

◆ prep()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.prep (   self)
inherited
Perform preparatory work associated with the dataset types to be
converted from this repository (but not the datasets themselves).

Notes
-----
This should be a relatively fast operation that should not depend on
the size of the repository.

Subclasses may override this method, but must delegate to the base
class implementation at some point in their own logic.
More often, subclasses will specialize the behavior of `prep` by
overriding other methods to which the base class implementation
delegates.  These include:
 - `iterMappings`
 - `isDatasetTypeSpecial`
 - `getSpecialDirectories`
 - `makeRepoWalkerTarget`

This should not perform any write operations to the Gen3 repository.
It is guaranteed to be called before `ingest`.

Reimplemented in lsst.obs.base.gen2to3.standardRepoConverter.StandardRepoConverter, and lsst.obs.base.gen2to3.rootRepoConverter.RootRepoConverter.

Definition at line 308 of file repoConverter.py.

308  def prep(self):
309  """Perform preparatory work associated with the dataset types to be
310  converted from this repository (but not the datasets themselves).
311 
312  Notes
313  -----
314  This should be a relatively fast operation that should not depend on
315  the size of the repository.
316 
317  Subclasses may override this method, but must delegate to the base
318  class implementation at some point in their own logic.
319  More often, subclasses will specialize the behavior of `prep` by
320  overriding other methods to which the base class implementation
321  delegates. These include:
322  - `iterMappings`
323  - `isDatasetTypeSpecial`
324  - `getSpecialDirectories`
325  - `makeRepoWalkerTarget`
326 
327  This should not perform any write operations to the Gen3 repository.
328  It is guaranteed to be called before `ingest`.
329  """
330  self.task.log.info(f"Preparing other dataset types from root {self.root}.")
331  walkerInputs: List[Union[RepoWalker.Target, RepoWalker.Skip]] = []
332  for datasetTypeName, mapping in self.iterMappings():
333  try:
334  template = mapping.template
335  except RuntimeError:
336  # No template for this dataset in this mapper, so there's no
337  # way there should be instances of this dataset in this repo.
338  continue
339  extensions = [""]
340  skip = False
341  message = None
342  storageClass = None
343  if (not self.task.isDatasetTypeIncluded(datasetTypeName)
344  or self.isDatasetTypeSpecial(datasetTypeName)):
345  # User indicated not to include this data, but we still want
346  # to recognize files of that type to avoid warning about them.
347  skip = True
348  else:
349  storageClass = self._guessStorageClass(datasetTypeName, mapping)
350  if storageClass is None:
351  # This may be a problem, but only if we actually encounter
352  # any files corresponding to this dataset. Of course, we
353  # need to be able to parse those files in order to
354  # recognize that situation.
355  message = f"no storage class found for {datasetTypeName}"
356  skip = True
357  # Handle files that are compressed on disk, but the gen2 template
358  # is just `.fits`
359  if template.endswith(".fits"):
360  extensions.extend((".gz", ".fz"))
361  for extension in extensions:
362  if skip:
363  walkerInput = RepoWalker.Skip(
364  template=template+extension,
365  keys=mapping.keys(),
366  message=message,
367  )
368  self.task.log.debug("Skipping template in walker: %s", template)
369  else:
370  assert message is None
371  targetHandler = self.task.config.targetHandlerClasses.get(datasetTypeName)
372  if targetHandler is not None:
373  targetHandler = doImport(targetHandler)
374  walkerInput = self.makeRepoWalkerTarget(
375  datasetTypeName=datasetTypeName,
376  template=template+extension,
377  keys=mapping.keys(),
378  storageClass=storageClass,
379  formatter=self.task.config.formatterClasses.get(datasetTypeName),
380  targetHandler=targetHandler,
381  )
382  self.task.log.debug("Adding template to walker: %s + %s, for %s", template, extension,
383  walkerInput.datasetType)
384  walkerInputs.append(walkerInput)
385 
386  for dirPath in self.getSpecialDirectories():
387  walkerInputs.append(
388  RepoWalker.Skip(
389  template=dirPath, # not really a template, but that's fine; it's relative to root.
390  keys={},
391  message=None,
392  isForFiles=True,
393  )
394  )
395  fileIgnoreRegExTerms = []
396  for pattern in self.task.config.fileIgnorePatterns:
397  fileIgnoreRegExTerms.append(fnmatch.translate(pattern))
398  if fileIgnoreRegExTerms:
399  fileIgnoreRegEx = re.compile("|".join(fileIgnoreRegExTerms))
400  else:
401  fileIgnoreRegEx = None
402  self._repoWalker = RepoWalker(walkerInputs, fileIgnoreRegEx=fileIgnoreRegEx,
403  log=self.task.log.getChild("repoWalker"))
404 

Member Data Documentation

◆ collection

lsst.obs.base.gen2to3.calibRepoConverter.CalibRepoConverter.collection

Definition at line 65 of file calibRepoConverter.py.

◆ instrument

lsst.obs.base.gen2to3.repoConverter.RepoConverter.instrument
inherited

Definition at line 213 of file repoConverter.py.

◆ mapper

lsst.obs.base.gen2to3.calibRepoConverter.CalibRepoConverter.mapper

Definition at line 64 of file calibRepoConverter.py.

◆ root

lsst.obs.base.gen2to3.repoConverter.RepoConverter.root
inherited

Definition at line 212 of file repoConverter.py.

◆ subset

lsst.obs.base.gen2to3.repoConverter.RepoConverter.subset
inherited

Definition at line 214 of file repoConverter.py.

◆ task

lsst.obs.base.gen2to3.repoConverter.RepoConverter.task
inherited

Definition at line 211 of file repoConverter.py.


The documentation for this class was generated from the following file: