LSSTApplications  18.0.0+106,18.0.0+50,19.0.0,19.0.0+1,19.0.0+10,19.0.0+11,19.0.0+13,19.0.0+17,19.0.0+2,19.0.0-1-g20d9b18+6,19.0.0-1-g425ff20,19.0.0-1-g5549ca4,19.0.0-1-g580fafe+6,19.0.0-1-g6fe20d0+1,19.0.0-1-g7011481+9,19.0.0-1-g8c57eb9+6,19.0.0-1-gb5175dc+11,19.0.0-1-gdc0e4a7+9,19.0.0-1-ge272bc4+6,19.0.0-1-ge3aa853,19.0.0-10-g448f008b,19.0.0-12-g6990b2c,19.0.0-2-g0d9f9cd+11,19.0.0-2-g3d9e4fb2+11,19.0.0-2-g5037de4,19.0.0-2-gb96a1c4+3,19.0.0-2-gd955cfd+15,19.0.0-3-g2d13df8,19.0.0-3-g6f3c7dc,19.0.0-4-g725f80e+11,19.0.0-4-ga671dab3b+1,19.0.0-4-gad373c5+3,19.0.0-5-ga2acb9c+2,19.0.0-5-gfe96e6c+2,w.2020.01
LSSTDataManagementBasePackage
Public Member Functions | Public Attributes | List of all members
lsst.obs.base.gen2to3.repoConverter.RepoConverter Class Reference
Inheritance diagram for lsst.obs.base.gen2to3.repoConverter.RepoConverter:
lsst.obs.base.gen2to3.calibRepoConverter.CalibRepoConverter lsst.obs.base.gen2to3.standardRepoConverter.StandardRepoConverter lsst.obs.base.gen2to3.rootRepoConverter.RootRepoConverter

Public Member Functions

def __init__
 
def isDatasetTypeSpecial
 
def isDirectorySpecial
 
def iterMappings (self)
 
def makeDataIdExtractor
 
def iterDatasets (self)
 
def prep (self)
 
def insertDimensionData (self)
 
def ingest (self)
 
def getButler
 

Public Attributes

 task
 
 root
 
 subset
 

Detailed Description

An abstract base class for objects that help `ConvertRepoTask` convert
datasets from a single Gen2 repository.

Parameters
----------
task : `ConvertRepoTask`
    Task instance that is using this helper object.
root : `str`
    Root of the Gen2 repo being converted.
collections : `list` of `str`
    Gen3 collections with which all converted datasets should be
    associated.
subset : `ConversionSubset, optional
    Helper object that implements a filter that restricts the data IDs that
    are converted.

Notes
-----
`RepoConverter` defines the only public API users of its subclasses should
use (`prep`, `insertDimensionRecords`, and `ingest`).  These delegate to
several abstract methods that subclasses must implement.  In some cases,
subclasses may reimplement the public methods as well, but are expected to
delegate to ``super()`` either at the beginning or end of their own
implementation.

Definition at line 224 of file repoConverter.py.

Constructor & Destructor Documentation

◆ __init__()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.__init__ (   self,
  task 
)

Definition at line 251 of file repoConverter.py.

251  def __init__(self, *, task: ConvertRepoTask, root: str, collections: List[str],
252  subset: Optional[ConversionSubset] = None):
253  self.task = task
254  self.root = root
255  self.subset = subset
256  self._collections = list(collections)
257  self._extractors: MostRecentlyUsedStack[DataIdExtractor] = MostRecentlyUsedStack()
258  self._skipParsers: MostRecentlyUsedStack[Tuple[FilePathParser, str, str]] = MostRecentlyUsedStack()
259 
daf::base::PropertyList * list
Definition: fits.cc:903

Member Function Documentation

◆ getButler()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.getButler (   self,
  datasetTypeName 
)

Definition at line 475 of file repoConverter.py.

475  def getButler(self, datasetTypeName: str) -> Tuple[Butler3, List[str]]:
476  """Create a new Gen3 Butler appropriate for a particular dataset type.
477 
478  This should be used exclusively by subclasses when obtaining a butler
479  to use for dataset ingest (`ConvertRepoTask.butler3` should never be
480  used directly).
481 
482  Parameters
483  ----------
484  datasetTypeName : `str`
485  Name of the dataset type.
486 
487  Returns
488  -------
489  butler : `lsst.daf.butler.Butler`
490  Gen3 Butler instance appropriate for ingesting the given dataset
491  type.
492  collections : `list` of `str`
493  Collections the dataset should be associated with, in addition to
494  the one used to define the `lsst.daf.butler.Run` used in
495  ``butler``.
496  """
497  if datasetTypeName in self.task.config.collections:
498  return (
499  Butler3(butler=self.task.butler3, run=self.task.config.collections[datasetTypeName]),
500  self._collections,
501  )
502  elif self._collections:
503  return (
504  Butler3(butler=self.task.butler3, run=self._collections[0]),
505  self._collections[1:],
506  )
507  else:
508  raise LookupError("No collection configured for dataset type {datasetTypeName}.")
509 

◆ ingest()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.ingest (   self)
Insert converted datasets into the Gen3 repository.

Subclasses may override this method, but must delegate to the base
class implementation at some point in their own logic.  More often,
subclasses will specialize the behavior of `ingest` simply by
overriding `iterDatasets` and `isDirectorySpecial`, to which the base
implementation delegates.

This method is guaranteed to be called after both `prep` and
`insertDimensionData`.

Definition at line 444 of file repoConverter.py.

444  def ingest(self):
445  """Insert converted datasets into the Gen3 repository.
446 
447  Subclasses may override this method, but must delegate to the base
448  class implementation at some point in their own logic. More often,
449  subclasses will specialize the behavior of `ingest` simply by
450  overriding `iterDatasets` and `isDirectorySpecial`, to which the base
451  implementation delegates.
452 
453  This method is guaranteed to be called after both `prep` and
454  `insertDimensionData`.
455  """
456  self.task.log.info("Finding datasets in repo %s.", self.root)
457  datasetsByType = defaultdict(list)
458  for dataset in self.iterDatasets():
459  datasetsByType[dataset.ref.datasetType].append(dataset)
460  for datasetType, datasetsForType in datasetsByType.items():
461  self.task.registry.registerDatasetType(datasetType)
462  self.task.log.info("Ingesting %s %s datasets.", len(datasetsForType), datasetType.name)
463  try:
464  butler3, collections = self.getButler(datasetType.name)
465  except LookupError as err:
466  self.task.log.warn(str(err))
467  continue
468  try:
469  butler3.ingest(*datasetsForType, transfer=self.task.config.transfer)
470  except LookupError as err:
471  raise LookupError(f"Error expanding data ID for dataset type {datasetType.name}.") from err
472  for collection in collections:
473  self.task.registry.associate(collection, [dataset.ref for dataset in datasetsForType])
474 
std::shared_ptr< FrameSet > append(FrameSet const &first, FrameSet const &second)
Construct a FrameSet that performs two transformations in series.
Definition: functional.cc:33

◆ insertDimensionData()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.insertDimensionData (   self)
Insert any dimension records uniquely derived from this repository
into the registry.

Subclasses may override this method, but may not need to; the default
implementation does nothing.

SkyMap and SkyPix dimensions should instead be handled by calling
`ConvertRepoTask.useSkyMap` or `ConvertRepoTask.useSkyPix`, because
these dimensions are in general shared by multiple Gen2 repositories.

This method is guaranteed to be called between `prep` and `ingest`.

Definition at line 429 of file repoConverter.py.

429  def insertDimensionData(self):
430  """Insert any dimension records uniquely derived from this repository
431  into the registry.
432 
433  Subclasses may override this method, but may not need to; the default
434  implementation does nothing.
435 
436  SkyMap and SkyPix dimensions should instead be handled by calling
437  `ConvertRepoTask.useSkyMap` or `ConvertRepoTask.useSkyPix`, because
438  these dimensions are in general shared by multiple Gen2 repositories.
439 
440  This method is guaranteed to be called between `prep` and `ingest`.
441  """
442  pass
443 

◆ isDatasetTypeSpecial()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.isDatasetTypeSpecial (   self,
  datasetTypeName 
)

Definition at line 261 of file repoConverter.py.

261  def isDatasetTypeSpecial(self, datasetTypeName: str) -> bool:
262  """Test whether the given dataset is handled specially by this
263  converter and hence should be ignored by generic base-class logic that
264  searches for dataset types to convert.
265 
266  Parameters
267  ----------
268  datasetTypeName : `str`
269  Name of the dataset type to test.
270 
271  Returns
272  -------
273  special : `bool`
274  `True` if the dataset type is special.
275  """
276  raise NotImplementedError()
277 

◆ isDirectorySpecial()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.isDirectorySpecial (   self,
  subdirectory 
)

Definition at line 279 of file repoConverter.py.

279  def isDirectorySpecial(self, subdirectory: str) -> bool:
280  """Test whether the given directory is handled specially by this
281  converter and hence should be ignored by generic base-class logic that
282  searches for datasets to convert.
283 
284  Parameters
285  ----------
286  subdirectory : `str`
287  Subdirectory. This is only ever a single subdirectory, and it
288  could appear anywhere within a repo root. (A full path relative
289  to the repo root might be more useful, but it is harder to
290  implement, and we don't currently need it to identify any special
291  directories).
292 
293  Returns
294  -------
295  special : `bool`
296  `True` if the direct is special.
297  """
298  raise NotImplementedError()
299 

◆ iterDatasets()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.iterDatasets (   self,
  Iterator,
  FileDataset 
)
Iterate over all datasets in the repository that should be
ingested into the Gen3 repository.

Subclasses may override this method, but must delegate to the base
class implementation at some point in their own logic.

Yields
------
dataset : `FileDataset`
    Structures representing datasets to be ingested.  Paths should be
    absolute.
ref : `lsst.daf.butler.DatasetRef`
    Reference for the Gen3 datasets, including a complete `DatasetType`
    and data ID.

Definition at line 347 of file repoConverter.py.

347  def iterDatasets(self) -> Iterator[FileDataset]:
348  """Iterate over all datasets in the repository that should be
349  ingested into the Gen3 repository.
350 
351  Subclasses may override this method, but must delegate to the base
352  class implementation at some point in their own logic.
353 
354  Yields
355  ------
356  dataset : `FileDataset`
357  Structures representing datasets to be ingested. Paths should be
358  absolute.
359  ref : `lsst.daf.butler.DatasetRef`
360  Reference for the Gen3 datasets, including a complete `DatasetType`
361  and data ID.
362  """
363  for dirPath, subdirNamesInDir, fileNamesInDir in os.walk(self.root, followlinks=True):
364  # Remove subdirectories that appear to be repositories themselves
365  # from the walking
366  def isRepoRoot(dirName):
367  return any(os.path.exists(os.path.join(dirPath, dirName, f))
368  for f in REPO_ROOT_FILES)
369  subdirNamesInDir[:] = [d for d in subdirNamesInDir
370  if not isRepoRoot(d) and not self.isDirectorySpecial(d)]
371  # Loop over files in this directory, and ask per-DatasetType
372  # extractors if they recognize them and can extract a data ID;
373  # if so, ingest.
374  dirPathInRoot = dirPath[len(self.root) + len(os.path.sep):]
375  for fileNameInDir in fileNamesInDir:
376  if any(fnmatch.fnmatchcase(fileNameInDir, pattern)
377  for pattern in self.task.config.fileIgnorePatterns):
378  continue
379  fileNameInRoot = os.path.join(dirPathInRoot, fileNameInDir)
380  if fileNameInRoot in REPO_ROOT_FILES:
381  continue
382  ref = self._extractDatasetRef(fileNameInRoot)
383  if ref is not None:
384  if self.subset is None or self.subset.isRelated(ref.dataId):
385  yield FileDataset(path=os.path.join(self.root, fileNameInRoot), ref=ref)
386  else:
387  self._handleUnrecognizedFile(fileNameInRoot)
388 
bool any(CoordinateExpr< N > const &expr) noexcept
Return true if any elements are true.

◆ iterMappings()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.iterMappings (   self,
  Iterator,
  Tuple,
  str,
  CameraMapperMapping 
)
Iterate over all `CameraMapper` `Mapping` objects that should be
considered for conversion by this repository.

This this should include any datasets that may appear in the
repository, including those that are special (see
`isDatasetTypeSpecial`) and those that are being ignored (see
`ConvertRepoTask.isDatasetTypeIncluded`); this allows the converter
to identify and hence skip these datasets quietly instead of warning
about them as unrecognized.

Yields
------
datasetTypeName: `str`
    Name of the dataset type.
mapping : `lsst.obs.base.mapping.Mapping`
    Mapping object used by the Gen2 `CameraMapper` to describe the
    dataset type.

Definition at line 301 of file repoConverter.py.

301  def iterMappings(self) -> Iterator[Tuple[str, CameraMapperMapping]]:
302  """Iterate over all `CameraMapper` `Mapping` objects that should be
303  considered for conversion by this repository.
304 
305  This this should include any datasets that may appear in the
306  repository, including those that are special (see
307  `isDatasetTypeSpecial`) and those that are being ignored (see
308  `ConvertRepoTask.isDatasetTypeIncluded`); this allows the converter
309  to identify and hence skip these datasets quietly instead of warning
310  about them as unrecognized.
311 
312  Yields
313  ------
314  datasetTypeName: `str`
315  Name of the dataset type.
316  mapping : `lsst.obs.base.mapping.Mapping`
317  Mapping object used by the Gen2 `CameraMapper` to describe the
318  dataset type.
319  """
320  raise NotImplementedError()
321 

◆ makeDataIdExtractor()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.makeDataIdExtractor (   self,
  datasetTypeName 
)

Definition at line 323 of file repoConverter.py.

323  def makeDataIdExtractor(self, datasetTypeName: str, parser: FilePathParser,
324  storageClass: StorageClass) -> DataIdExtractor:
325  """Construct a `DataIdExtractor` instance appropriate for a particular
326  dataset type.
327 
328  Parameters
329  ----------
330  datasetTypeName : `str`
331  Name of the dataset type; typically forwarded directly to
332  the `DataIdExtractor` constructor.
333  parser : `FilePathParser`
334  Object that parses filenames into Gen2 data IDs; typically
335  forwarded directly to the `DataIdExtractor` constructor.
336  storageClass : `lsst.daf.butler.StorageClass`
337  Storage class for this dataset type in the Gen3 butler; typically
338  forwarded directly to the `DataIdExtractor` constructor.
339 
340  Returns
341  -------
342  extractor : `DataIdExtractor`
343  A new `DataIdExtractor` instance.
344  """
345  raise NotImplementedError()
346 

◆ prep()

def lsst.obs.base.gen2to3.repoConverter.RepoConverter.prep (   self)
Prepare the repository by identifying the dataset types to be
converted and building `DataIdExtractor` instance for them.

Subclasses may override this method, but must delegate to the base
class implementation at some point in their own logic.  More often,
subclasses will specialize the behavior of `prep` simply by overriding
`iterMappings`, `isDatasetTypeSpecial`, and `makeDataIdExtractor`, to
which the base implementation delegates.

This should not perform any write operations to the Gen3 repository.
It is guaranteed to be called before `insertDimensionData` and
`ingest`.

Definition at line 389 of file repoConverter.py.

389  def prep(self):
390  """Prepare the repository by identifying the dataset types to be
391  converted and building `DataIdExtractor` instance for them.
392 
393  Subclasses may override this method, but must delegate to the base
394  class implementation at some point in their own logic. More often,
395  subclasses will specialize the behavior of `prep` simply by overriding
396  `iterMappings`, `isDatasetTypeSpecial`, and `makeDataIdExtractor`, to
397  which the base implementation delegates.
398 
399  This should not perform any write operations to the Gen3 repository.
400  It is guaranteed to be called before `insertDimensionData` and
401  `ingest`.
402  """
403  self.task.log.info(f"Preparing other datasets from root {self.root}.")
404  for datasetTypeName, mapping in self.iterMappings():
405  try:
406  parser = FilePathParser.fromMapping(mapping)
407  except RuntimeError:
408  # No template, so there should be no way we'd get one of these
409  # in the Gen2 repo anyway (and if we do, we'll still produce a
410  # warning - just a less informative one than we might be able
411  # to produce if we had a template).
412  continue
413  if (not self.task.isDatasetTypeIncluded(datasetTypeName) or
414  self.isDatasetTypeSpecial(datasetTypeName)):
415  # User indicated not to include this data, but we still want
416  # to recognize files of that type to avoid warning about them.
417  self._skipParsers.push((parser, datasetTypeName, None))
418  continue
419  storageClass = self._guessStorageClass(datasetTypeName, mapping)
420  if storageClass is None:
421  # This may be a problem, but only if we actually encounter any
422  # files corresponding to this dataset. Of course, we need
423  # to be able to parse those files in order to recognize that
424  # situation.
425  self._skipParsers.push((parser, datasetTypeName, "no storage class found."))
426  continue
427  self._extractors.push(self.makeDataIdExtractor(datasetTypeName, parser, storageClass))
428 

Member Data Documentation

◆ root

lsst.obs.base.gen2to3.repoConverter.RepoConverter.root

Definition at line 254 of file repoConverter.py.

◆ subset

lsst.obs.base.gen2to3.repoConverter.RepoConverter.subset

Definition at line 255 of file repoConverter.py.

◆ task

lsst.obs.base.gen2to3.repoConverter.RepoConverter.task

Definition at line 253 of file repoConverter.py.


The documentation for this class was generated from the following file: