Inheritance diagram for lsst.obs.base.gen2to3.repoConverter.RepoConverter:

Public Member Functions
def	__init__

def	isDatasetTypeSpecial

def	isDirectorySpecial

def	iterMappings (self)

def	makeDataIdExtractor

def	iterDatasets (self)

def	prep (self)

def	insertDimensionData (self)

def	ingest (self)

def	getButler

Public Attributes
	task

	root

	subset

Detailed Description

An abstract base class for objects that help `ConvertRepoTask` convert
datasets from a single Gen2 repository.

Parameters
----------
task : `ConvertRepoTask`
    Task instance that is using this helper object.
root : `str`
    Root of the Gen2 repo being converted.
collections : `list` of `str`
    Gen3 collections with which all converted datasets should be
    associated.
subset : `ConversionSubset, optional
    Helper object that implements a filter that restricts the data IDs that
    are converted.

Notes
-----
`RepoConverter` defines the only public API users of its subclasses should
use (`prep`, `insertDimensionRecords`, and `ingest`).  These delegate to
several abstract methods that subclasses must implement.  In some cases,
subclasses may reimplement the public methods as well, but are expected to
delegate to ``super()`` either at the beginning or end of their own
implementation.

Definition at line 224 of file repoConverter.py.

Constructor & Destructor Documentation

◆ init()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.__init__	(	self,
		task
	)

Definition at line 251 of file repoConverter.py.

     def __init__(self, *, task: ConvertRepoTask, root: str, collections: List[str],
                  subset: Optional[ConversionSubset] = None):
         self.task = task
         self.root = root
         self.subset = subset
         self._collections = list(collections)
         self._extractors: MostRecentlyUsedStack[DataIdExtractor] = MostRecentlyUsedStack()
         self._skipParsers: MostRecentlyUsedStack[Tuple[FilePathParser, str, str]] = MostRecentlyUsedStack()
 

Member Function Documentation

◆ getButler()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.getButler	(	self,
		datasetTypeName
	)

Definition at line 475 of file repoConverter.py.

     def getButler(self, datasetTypeName: str) -> Tuple[Butler3, List[str]]:
         """Create a new Gen3 Butler appropriate for a particular dataset type.
 
         This should be used exclusively by subclasses when obtaining a butler
         to use for dataset ingest (`ConvertRepoTask.butler3` should never be
         used directly).
 
         Parameters
         ----------
         datasetTypeName : `str`
             Name of the dataset type.
 
         Returns
         -------
         butler : `lsst.daf.butler.Butler`
             Gen3 Butler instance appropriate for ingesting the given dataset
             type.
         collections : `list` of `str`
             Collections the dataset should be associated with, in addition to
             the one used to define the `lsst.daf.butler.Run` used in
             ``butler``.
         """
         if datasetTypeName in self.task.config.collections:
             return (
                 Butler3(butler=self.task.butler3, run=self.task.config.collections[datasetTypeName]),
                 self._collections,
             )
         elif self._collections:
             return (
                 Butler3(butler=self.task.butler3, run=self._collections[0]),
                 self._collections[1:],
             )
         else:
             raise LookupError("No collection configured for dataset type {datasetTypeName}.")
 

◆ ingest()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.ingest ( self )

Insert converted datasets into the Gen3 repository.

Subclasses may override this method, but must delegate to the base
class implementation at some point in their own logic.  More often,
subclasses will specialize the behavior of `ingest` simply by
overriding `iterDatasets` and `isDirectorySpecial`, to which the base
implementation delegates.

This method is guaranteed to be called after both `prep` and
`insertDimensionData`.

Definition at line 444 of file repoConverter.py.

     def ingest(self):
         """Insert converted datasets into the Gen3 repository.
 
         Subclasses may override this method, but must delegate to the base
         class implementation at some point in their own logic.  More often,
         subclasses will specialize the behavior of `ingest` simply by
         overriding `iterDatasets` and `isDirectorySpecial`, to which the base
         implementation delegates.
 
         This method is guaranteed to be called after both `prep` and
         `insertDimensionData`.
         """
         self.task.log.info("Finding datasets in repo %s.", self.root)
         datasetsByType = defaultdict(list)
         for dataset in self.iterDatasets():
             datasetsByType[dataset.ref.datasetType].append(dataset)
         for datasetType, datasetsForType in datasetsByType.items():
             self.task.registry.registerDatasetType(datasetType)
             self.task.log.info("Ingesting %s %s datasets.", len(datasetsForType), datasetType.name)
             try:
                 butler3, collections = self.getButler(datasetType.name)
             except LookupError as err:
                 self.task.log.warn(str(err))
                 continue
             try:
                 butler3.ingest(*datasetsForType, transfer=self.task.config.transfer)
             except LookupError as err:
                 raise LookupError(f"Error expanding data ID for dataset type {datasetType.name}.") from err
             for collection in collections:
                 self.task.registry.associate(collection, [dataset.ref for dataset in datasetsForType])
 

◆ insertDimensionData()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.insertDimensionData ( self )

Insert any dimension records uniquely derived from this repository
into the registry.

Subclasses may override this method, but may not need to; the default
implementation does nothing.

SkyMap and SkyPix dimensions should instead be handled by calling
`ConvertRepoTask.useSkyMap` or `ConvertRepoTask.useSkyPix`, because
these dimensions are in general shared by multiple Gen2 repositories.

This method is guaranteed to be called between `prep` and `ingest`.

Definition at line 429 of file repoConverter.py.

     def insertDimensionData(self):
         """Insert any dimension records uniquely derived from this repository
         into the registry.
 
         Subclasses may override this method, but may not need to; the default
         implementation does nothing.
 
         SkyMap and SkyPix dimensions should instead be handled by calling
         `ConvertRepoTask.useSkyMap` or `ConvertRepoTask.useSkyPix`, because
         these dimensions are in general shared by multiple Gen2 repositories.
 
         This method is guaranteed to be called between `prep` and `ingest`.
         """
         pass
 

◆ isDatasetTypeSpecial()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.isDatasetTypeSpecial	(	self,
		datasetTypeName
	)

Definition at line 261 of file repoConverter.py.

     def isDatasetTypeSpecial(self, datasetTypeName: str) -> bool:
         """Test whether the given dataset is handled specially by this
         converter and hence should be ignored by generic base-class logic that
         searches for dataset types to convert.
 
         Parameters
         ----------
         datasetTypeName : `str`
             Name of the dataset type to test.
 
         Returns
         -------
         special : `bool`
             `True` if the dataset type is special.
         """
         raise NotImplementedError()
 

◆ isDirectorySpecial()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.isDirectorySpecial	(	self,
		subdirectory
	)

Definition at line 279 of file repoConverter.py.

     def isDirectorySpecial(self, subdirectory: str) -> bool:
         """Test whether the given directory is handled specially by this
         converter and hence should be ignored by generic base-class logic that
         searches for datasets to convert.
 
         Parameters
         ----------
         subdirectory : `str`
             Subdirectory.  This is only ever a single subdirectory, and it
             could appear anywhere within a repo root.  (A full path relative
             to the repo root might be more useful, but it is harder to
             implement, and we don't currently need it to identify any special
             directories).
 
         Returns
         -------
         special : `bool`
             `True` if the direct is special.
         """
         raise NotImplementedError()
 

◆ iterDatasets()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.iterDatasets	(	self,
		Iterator,
		FileDataset
	)

Iterate over all datasets in the repository that should be
ingested into the Gen3 repository.

Subclasses may override this method, but must delegate to the base
class implementation at some point in their own logic.

Yields
------
dataset : `FileDataset`
    Structures representing datasets to be ingested.  Paths should be
    absolute.
ref : `lsst.daf.butler.DatasetRef`
    Reference for the Gen3 datasets, including a complete `DatasetType`
    and data ID.

Definition at line 347 of file repoConverter.py.

     def iterDatasets(self) -> Iterator[FileDataset]:
         """Iterate over all datasets in the repository that should be
         ingested into the Gen3 repository.
 
         Subclasses may override this method, but must delegate to the base
         class implementation at some point in their own logic.
 
         Yields
         ------
         dataset : `FileDataset`
             Structures representing datasets to be ingested.  Paths should be
             absolute.
         ref : `lsst.daf.butler.DatasetRef`
             Reference for the Gen3 datasets, including a complete `DatasetType`
             and data ID.
         """
         for dirPath, subdirNamesInDir, fileNamesInDir in os.walk(self.root, followlinks=True):
             # Remove subdirectories that appear to be repositories themselves
             # from the walking
             def isRepoRoot(dirName):
                 return any(os.path.exists(os.path.join(dirPath, dirName, f))
                            for f in REPO_ROOT_FILES)
             subdirNamesInDir[:] = [d for d in subdirNamesInDir
                                    if not isRepoRoot(d) and not self.isDirectorySpecial(d)]
             # Loop over files in this directory, and ask per-DatasetType
             # extractors if they recognize them and can extract a data ID;
             # if so, ingest.
             dirPathInRoot = dirPath[len(self.root) + len(os.path.sep):]
             for fileNameInDir in fileNamesInDir:
                 if any(fnmatch.fnmatchcase(fileNameInDir, pattern)
                        for pattern in self.task.config.fileIgnorePatterns):
                     continue
                 fileNameInRoot = os.path.join(dirPathInRoot, fileNameInDir)
                 if fileNameInRoot in REPO_ROOT_FILES:
                     continue
                 ref = self._extractDatasetRef(fileNameInRoot)
                 if ref is not None:
                     if self.subset is None or self.subset.isRelated(ref.dataId):
                         yield FileDataset(path=os.path.join(self.root, fileNameInRoot), ref=ref)
                 else:
                     self._handleUnrecognizedFile(fileNameInRoot)
 

◆ iterMappings()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.iterMappings	(	self,
		Iterator,
		Tuple,
		str,
		CameraMapperMapping
	)

Iterate over all `CameraMapper` `Mapping` objects that should be
considered for conversion by this repository.

This this should include any datasets that may appear in the
repository, including those that are special (see
`isDatasetTypeSpecial`) and those that are being ignored (see
`ConvertRepoTask.isDatasetTypeIncluded`); this allows the converter
to identify and hence skip these datasets quietly instead of warning
about them as unrecognized.

Yields
------
datasetTypeName: `str`
    Name of the dataset type.
mapping : `lsst.obs.base.mapping.Mapping`
    Mapping object used by the Gen2 `CameraMapper` to describe the
    dataset type.

Definition at line 301 of file repoConverter.py.

     def iterMappings(self) -> Iterator[Tuple[str, CameraMapperMapping]]:
         """Iterate over all `CameraMapper` `Mapping` objects that should be
         considered for conversion by this repository.
 
         This this should include any datasets that may appear in the
         repository, including those that are special (see
         `isDatasetTypeSpecial`) and those that are being ignored (see
         `ConvertRepoTask.isDatasetTypeIncluded`); this allows the converter
         to identify and hence skip these datasets quietly instead of warning
         about them as unrecognized.
 
         Yields
         ------
         datasetTypeName: `str`
             Name of the dataset type.
         mapping : `lsst.obs.base.mapping.Mapping`
             Mapping object used by the Gen2 `CameraMapper` to describe the
             dataset type.
         """
         raise NotImplementedError()
 

◆ makeDataIdExtractor()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.makeDataIdExtractor	(	self,
		datasetTypeName
	)

Definition at line 323 of file repoConverter.py.

     def makeDataIdExtractor(self, datasetTypeName: str, parser: FilePathParser,
                             storageClass: StorageClass) -> DataIdExtractor:
         """Construct a `DataIdExtractor` instance appropriate for a particular
         dataset type.
 
         Parameters
         ----------
         datasetTypeName : `str`
             Name of the dataset type; typically forwarded directly to
             the `DataIdExtractor` constructor.
         parser : `FilePathParser`
             Object that parses filenames into Gen2 data IDs; typically
             forwarded directly to the `DataIdExtractor` constructor.
         storageClass : `lsst.daf.butler.StorageClass`
             Storage class for this dataset type in the Gen3 butler; typically
             forwarded directly to the `DataIdExtractor` constructor.
 
         Returns
         -------
         extractor : `DataIdExtractor`
             A new `DataIdExtractor` instance.
         """
         raise NotImplementedError()
 

◆ prep()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.prep ( self )

Prepare the repository by identifying the dataset types to be
converted and building `DataIdExtractor` instance for them.

Subclasses may override this method, but must delegate to the base
class implementation at some point in their own logic.  More often,
subclasses will specialize the behavior of `prep` simply by overriding
`iterMappings`, `isDatasetTypeSpecial`, and `makeDataIdExtractor`, to
which the base implementation delegates.

This should not perform any write operations to the Gen3 repository.
It is guaranteed to be called before `insertDimensionData` and
`ingest`.

Definition at line 389 of file repoConverter.py.

     def prep(self):
         """Prepare the repository by identifying the dataset types to be
         converted and building `DataIdExtractor` instance for them.
 
         Subclasses may override this method, but must delegate to the base
         class implementation at some point in their own logic.  More often,
         subclasses will specialize the behavior of `prep` simply by overriding
         `iterMappings`, `isDatasetTypeSpecial`, and `makeDataIdExtractor`, to
         which the base implementation delegates.
 
         This should not perform any write operations to the Gen3 repository.
         It is guaranteed to be called before `insertDimensionData` and
         `ingest`.
         """
         self.task.log.info(f"Preparing other datasets from root {self.root}.")
         for datasetTypeName, mapping in self.iterMappings():
             try:
                 parser = FilePathParser.fromMapping(mapping)
             except RuntimeError:
                 # No template, so there should be no way we'd get one of these
                 # in the Gen2 repo anyway (and if we do, we'll still produce a
                 # warning - just a less informative one than we might be able
                 # to produce if we had a template).
                 continue
             if (not self.task.isDatasetTypeIncluded(datasetTypeName) or
                     self.isDatasetTypeSpecial(datasetTypeName)):
                 # User indicated not to include this data, but we still want
                 # to recognize files of that type to avoid warning about them.
                 self._skipParsers.push((parser, datasetTypeName, None))
                 continue
             storageClass = self._guessStorageClass(datasetTypeName, mapping)
             if storageClass is None:
                 # This may be a problem, but only if we actually encounter any
                 # files corresponding to this dataset.  Of course, we need
                 # to be able to parse those files in order to recognize that
                 # situation.
                 self._skipParsers.push((parser, datasetTypeName, "no storage class found."))
                 continue
             self._extractors.push(self.makeDataIdExtractor(datasetTypeName, parser, storageClass))
 

Member Data Documentation

◆ root

lsst.obs.base.gen2to3.repoConverter.RepoConverter.root

Definition at line 254 of file repoConverter.py.

◆ subset

lsst.obs.base.gen2to3.repoConverter.RepoConverter.subset

Definition at line 255 of file repoConverter.py.

◆ task

lsst.obs.base.gen2to3.repoConverter.RepoConverter.task

Definition at line 253 of file repoConverter.py.

The documentation for this class was generated from the following file:

/j/snowflake/release/lsstsw/stack/Linux64/obs_base/19.0.0-12-g6990b2c/python/lsst/obs/base/gen2to3/repoConverter.py

Public Member Functions

Public Attributes