LSST Applications  21.0.0-172-gfb10e10a+18fedfabac,22.0.0+297cba6710,22.0.0+80564b0ff1,22.0.0+8d77f4f51a,22.0.0+a28f4c53b1,22.0.0+dcf3732eb2,22.0.1-1-g7d6de66+2a20fdde0d,22.0.1-1-g8e32f31+297cba6710,22.0.1-1-geca5380+7fa3b7d9b6,22.0.1-12-g44dc1dc+2a20fdde0d,22.0.1-15-g6a90155+515f58c32b,22.0.1-16-g9282f48+790f5f2caa,22.0.1-2-g92698f7+dcf3732eb2,22.0.1-2-ga9b0f51+7fa3b7d9b6,22.0.1-2-gd1925c9+bf4f0e694f,22.0.1-24-g1ad7a390+a9625a72a8,22.0.1-25-g5bf6245+3ad8ecd50b,22.0.1-25-gb120d7b+8b5510f75f,22.0.1-27-g97737f7+2a20fdde0d,22.0.1-32-gf62ce7b1+aa4237961e,22.0.1-4-g0b3f228+2a20fdde0d,22.0.1-4-g243d05b+871c1b8305,22.0.1-4-g3a563be+32dcf1063f,22.0.1-4-g44f2e3d+9e4ab0f4fa,22.0.1-42-gca6935d93+ba5e5ca3eb,22.0.1-5-g15c806e+85460ae5f3,22.0.1-5-g58711c4+611d128589,22.0.1-5-g75bb458+99c117b92f,22.0.1-6-g1c63a23+7fa3b7d9b6,22.0.1-6-g50866e6+84ff5a128b,22.0.1-6-g8d3140d+720564cf76,22.0.1-6-gd805d02+cc5644f571,22.0.1-8-ge5750ce+85460ae5f3,master-g6e05de7fdc+babf819c66,master-g99da0e417a+8d77f4f51a,w.2021.48
LSST Data Management Base Package
Public Member Functions | Public Attributes | List of all members
lsst.pipe.tasks.repositoryIterator.SourceData Class Reference

Public Member Functions

def __init__ (self, datasetType, sourceKeyTuple)
 
def addSourceMetrics (self, repoInfo, idKeyTuple, idValList, sourceTableList)
 
def finalize (self)
 

Public Attributes

 datasetType
 
 repoInfoList
 
 sourceArr
 
 sourceIdDict
 
 repoArr
 

Detailed Description

Accumulate a set of measurements from a set of source tables

To use:
- specify the desired source measurements when constructing this object
- call addSourceMetrics for each repository you harvest data from
- call finalize to produce the final data

Data available after calling finalize:
- self.sourceArr: a numpy structured array of shape (num repositories, num sources)
    containing named columns for:
    - source ID
    - each data ID key
    - each item of data extracted from the source table
- self.sourceIdDict: a dict of (source ID: index of axis 1 of self.sourceArr)
- self.repoArr: a numpy structured array of shape (num repositories,)
    containing a named column for each repository key (see RepositoryIterator)

@note: sources that had non-finite data (e.g. NaN) for every value extracted are silently omitted

Definition at line 54 of file repositoryIterator.py.

Constructor & Destructor Documentation

◆ __init__()

def lsst.pipe.tasks.repositoryIterator.SourceData.__init__ (   self,
  datasetType,
  sourceKeyTuple 
)
@param[in] datasetType: dataset type for source
@param[in] sourceKeyTuple: list of keys of data items to extract from the source tables

@raise RuntimeError if sourceKeyTuple is empty

Definition at line 75 of file repositoryIterator.py.

75  def __init__(self, datasetType, sourceKeyTuple):
76  """
77  @param[in] datasetType: dataset type for source
78  @param[in] sourceKeyTuple: list of keys of data items to extract from the source tables
79 
80  @raise RuntimeError if sourceKeyTuple is empty
81  """
82  if len(sourceKeyTuple) < 1:
83  raise RuntimeError("Must specify at least one key in sourceKeyTuple")
84  self.datasetType = datasetType
85  self._sourceKeyTuple = tuple(sourceKeyTuple)
86 
87  self._idKeyTuple = None # tuple of data ID keys, in order; set by first call to _getSourceMetrics
88  self._idKeyDTypeList = None # numpy dtype for data ID tuple, as a list of (key, type);
89  # set by first call to _getSourceMetrics
90  self._sourceDTypeList = None # numpy dtype for source data, as a list of (key, type);
91  # set by first call to _getSourceMetrics
92  self._repoKeyTuple = None # tuple of repo ID keys, in order; set by first call to addSourceMetrics
93  self._repoDTypeList = None # numpy dtype for repoArr, as a list of (key, type);
94  # set by first call to addSourceMetrics
95 
96  self._tempDataList = [] # list (one entry per repository)
97  # of dict of source ID: tuple of data ID data concatenated with source metric data, where:
98  # data ID data is in order self._idKeyTuple
99  # source metric data is in order self._sourceKeyTuple
100  self.repoInfoList = [] # list of repoInfo
101 

Member Function Documentation

◆ addSourceMetrics()

def lsst.pipe.tasks.repositoryIterator.SourceData.addSourceMetrics (   self,
  repoInfo,
  idKeyTuple,
  idValList,
  sourceTableList 
)
Accumulate source measurements from a list of source tables.

Once you have accumulated all source measurements, call finalize to process the data.

@param[in] repoInfo: a RepositoryInfo instance
@param[in] idKeyTuple: a tuple of data ID keys; must be the same for each call
@param[in] idValList: a list of data ID value tuples;
    each tuple contains values in the order in idKeyTuple
@param[in] sourceTableList: a list of source tables, one per entry in idValList

@raise RuntimeError if idKeyTuple is different than it was for the first call.

Accumulates the data in temporary cache self._tempDataList.

@return number of sources

Definition at line 153 of file repositoryIterator.py.

153  def addSourceMetrics(self, repoInfo, idKeyTuple, idValList, sourceTableList):
154  """Accumulate source measurements from a list of source tables.
155 
156  Once you have accumulated all source measurements, call finalize to process the data.
157 
158  @param[in] repoInfo: a RepositoryInfo instance
159  @param[in] idKeyTuple: a tuple of data ID keys; must be the same for each call
160  @param[in] idValList: a list of data ID value tuples;
161  each tuple contains values in the order in idKeyTuple
162  @param[in] sourceTableList: a list of source tables, one per entry in idValList
163 
164  @raise RuntimeError if idKeyTuple is different than it was for the first call.
165 
166  Accumulates the data in temporary cache self._tempDataList.
167 
168  @return number of sources
169  """
170  if self._repoKeyTuple is None:
171  self._repoKeyTuple = repoInfo.keyTuple
172  self._repoDTypeList = repoInfo.dtype
173 
174  dataDict = self._getSourceMetrics(idKeyTuple, idValList, sourceTableList)
175 
176  self._tempDataList.append(dataDict)
177  self.repoInfoList.append(repoInfo)
178  return len(dataDict)
179 
std::shared_ptr< FrameSet > append(FrameSet const &first, FrameSet const &second)
Construct a FrameSet that performs two transformations in series.
Definition: functional.cc:33

◆ finalize()

def lsst.pipe.tasks.repositoryIterator.SourceData.finalize (   self)
Process the accumulated source measurements to create the final data products.

Only call this after you have added all source metrics using addSourceMetrics.

Reads temporary cache self._tempDataList and then deletes it.

Definition at line 180 of file repositoryIterator.py.

180  def finalize(self):
181  """Process the accumulated source measurements to create the final data products.
182 
183  Only call this after you have added all source metrics using addSourceMetrics.
184 
185  Reads temporary cache self._tempDataList and then deletes it.
186  """
187  if len(self._tempDataList) == 0:
188  raise RuntimeError("No data found")
189 
190  fullSrcIdSet = set()
191  for dataIdDict in self._tempDataList:
192  fullSrcIdSet.update(iter(dataIdDict.keys()))
193 
194  # source data
195  sourceArrDType = [("sourceId", int)] + self._idKeyDTypeList + self._sourceDTypeList
196  # data for missing sources (only for the data in the source data dict, so excludes srcId)
197  nullSourceTuple = tuple(numpy.zeros(1, dtype=self._idKeyDTypeList + self._sourceDTypeList)[0])
198 
199  sourceData = [[(srcId,) + srcDataDict.get(srcId, nullSourceTuple) for srcId in fullSrcIdSet]
200  for srcDataDict in self._tempDataList]
201 
202  self.sourceArr = numpy.array(sourceData, dtype=sourceArrDType)
203  del sourceData
204 
205  self.sourceIdDict = dict((srcId, i) for i, srcId in enumerate(fullSrcIdSet))
206 
207  # repository data
208  repoData = [repoInfo.valTuple for repoInfo in self.repoInfoList]
209  self.repoArr = numpy.array(repoData, dtype=self._repoDTypeList)
210 
211  self._tempDataList = None
212 
213 
daf::base::PropertySet * set
Definition: fits.cc:912

Member Data Documentation

◆ datasetType

lsst.pipe.tasks.repositoryIterator.SourceData.datasetType

Definition at line 84 of file repositoryIterator.py.

◆ repoArr

lsst.pipe.tasks.repositoryIterator.SourceData.repoArr

Definition at line 209 of file repositoryIterator.py.

◆ repoInfoList

lsst.pipe.tasks.repositoryIterator.SourceData.repoInfoList

Definition at line 100 of file repositoryIterator.py.

◆ sourceArr

lsst.pipe.tasks.repositoryIterator.SourceData.sourceArr

Definition at line 202 of file repositoryIterator.py.

◆ sourceIdDict

lsst.pipe.tasks.repositoryIterator.SourceData.sourceIdDict

Definition at line 205 of file repositoryIterator.py.


The documentation for this class was generated from the following file: