LSSTApplications  10.0+286,10.0+36,10.0+46,10.0-2-g4f67435,10.1+152,10.1+37,11.0,11.0+1,11.0-1-g47edd16,11.0-1-g60db491,11.0-1-g7418c06,11.0-2-g04d2804,11.0-2-g68503cd,11.0-2-g818369d,11.0-2-gb8b8ce7
LSSTDataManagementBasePackage
Public Member Functions | Public Attributes | Private Member Functions | Private Attributes | List of all members
lsst.pipe.tasks.repositoryIterator.SourceData Class Reference
Inheritance diagram for lsst.pipe.tasks.repositoryIterator.SourceData:

Public Member Functions

def __init__
 
def addSourceMetrics
 
def finalize
 

Public Attributes

 datasetType
 
 repoInfoList
 
 sourceArr
 
 sourceIdDict
 
 repoArr
 

Private Member Functions

def _getSourceMetrics
 

Private Attributes

 _sourceKeyTuple
 
 _idKeyTuple
 
 _idKeyDTypeList
 
 _sourceDTypeList
 
 _repoKeyTuple
 
 _repoDTypeList
 
 _tempDataList
 

Detailed Description

Accumulate a set of measurements from a set of source tables

To use:
- specify the desired source measurements when constructing this object
- call addSourceMetrics for each repository you harvest data from
- call finalize to produce the final data

Data available after calling finalize:
- self.sourceArr: a numpy structured array of shape (num repositories, num sources)
    containing named columns for:
    - source ID
    - each data ID key
    - each item of data extracted from the source table
- self.sourceIdDict: a dict of (source ID: index of axis 1 of self.sourceArr)
- self.repoArr: a numpy structured array of shape (num repositories,)
    containing a named column for each repository key (see RepositoryIterator)

@note: sources that had non-finite data (e.g. NaN) for every value extracted are silently omitted

Definition at line 53 of file repositoryIterator.py.

Constructor & Destructor Documentation

def lsst.pipe.tasks.repositoryIterator.SourceData.__init__ (   self,
  datasetType,
  sourceKeyTuple 
)
@param[in] datasetType: dataset type for source
@param[in] sourceKeyTuple: list of keys of data items to extract from the source tables

@raise RuntimeError if sourceKeyTuple is empty

Definition at line 73 of file repositoryIterator.py.

73 
74  def __init__(self, datasetType, sourceKeyTuple):
75  """
76  @param[in] datasetType: dataset type for source
77  @param[in] sourceKeyTuple: list of keys of data items to extract from the source tables
78 
79  @raise RuntimeError if sourceKeyTuple is empty
80  """
81  if len(sourceKeyTuple) < 1:
82  raise RuntimeError("Must specify at least one key in sourceKeyTuple")
83  self.datasetType = datasetType
84  self._sourceKeyTuple = tuple(sourceKeyTuple)
85 
86  self._idKeyTuple = None # tuple of data ID keys, in order; set by first call to _getSourceMetrics
87  self._idKeyDTypeList = None # numpy dtype for data ID tuple, as a list of (key, type);
88  # set by first call to _getSourceMetrics
89  self._sourceDTypeList = None # numpy dtype for source data, as a list of (key, type);
90  # set by first call to _getSourceMetrics
91  self._repoKeyTuple = None # tuple of repo ID keys, in order; set by first call to addSourceMetrics
92  self._repoDTypeList = None # numpy dtype for repoArr, as a list of (key, type);
93  # set by first call to addSourceMetrics
94 
95  self._tempDataList = [] # list (one entry per repository)
96  # of dict of source ID: tuple of data ID data concatenated with source metric data, where:
97  # data ID data is in order self._idKeyTuple
98  # source metric data is in order self._sourceKeyTuple
99  self.repoInfoList = [] # list of repoInfo

Member Function Documentation

def lsst.pipe.tasks.repositoryIterator.SourceData._getSourceMetrics (   self,
  idKeyTuple,
  idValList,
  sourceTableList 
)
private
Obtain the desired source measurements from a list of source tables

Extracts a set of source measurements (specified by sourceKeyTuple) from a list of source tables
(one per data ID) and saves them as a dict of source ID: list of data

@param[in] idKeyTuple: a tuple of data ID keys; must be the same for each call
@param[in] idValList: a list of data ID value tuples;
    each tuple contains values in the order in idKeyTuple
@param[in] sourceTableList: a list of source tables, one per entry in idValList

@return a dict of source id: data id tuple + source data tuple
    where source data tuple order matches sourceKeyTuple
    and data id tuple matches self._idKeyTuple (which is set from the first idKeyTuple)

@raise RuntimeError if idKeyTuple is different than it was for the first call.

GetRepositoryDataTask.run returns idKeyTuple and idValList; you can easily make
a subclass of GetRepositoryDataTask that also returns sourceTableList.

Updates instance variables:
- self._idKeyTuple if not already set.

Definition at line 100 of file repositoryIterator.py.

101  def _getSourceMetrics(self, idKeyTuple, idValList, sourceTableList):
102  """Obtain the desired source measurements from a list of source tables
103 
104  Extracts a set of source measurements (specified by sourceKeyTuple) from a list of source tables
105  (one per data ID) and saves them as a dict of source ID: list of data
106 
107  @param[in] idKeyTuple: a tuple of data ID keys; must be the same for each call
108  @param[in] idValList: a list of data ID value tuples;
109  each tuple contains values in the order in idKeyTuple
110  @param[in] sourceTableList: a list of source tables, one per entry in idValList
111 
112  @return a dict of source id: data id tuple + source data tuple
113  where source data tuple order matches sourceKeyTuple
114  and data id tuple matches self._idKeyTuple (which is set from the first idKeyTuple)
115 
116  @raise RuntimeError if idKeyTuple is different than it was for the first call.
117 
118  GetRepositoryDataTask.run returns idKeyTuple and idValList; you can easily make
119  a subclass of GetRepositoryDataTask that also returns sourceTableList.
120 
121  Updates instance variables:
122  - self._idKeyTuple if not already set.
123  """
124  if self._idKeyTuple is None:
125  self._idKeyTuple = tuple(idKeyTuple)
126  self._idKeyDTypeList = _getDTypeList(keyTuple = self._idKeyTuple,
127  valTuple = idValList[0])
128  else:
129  if self._idKeyTuple != tuple(idKeyTuple):
130  raise RuntimeError("idKeyTuple = %s != %s = first idKeyTuple; must be the same each time" % \
131  (idKeyTuple, self._idKeyTuple))
132 
133  dataDict = {}
134  for idTuple, sourceTable in itertools.izip(idValList, sourceTableList):
135  if len(sourceTable) == 0:
136  continue
137 
138  idList = sourceTable.get("id")
139  dataList = [sourceTable.get(key) for key in self._sourceKeyTuple]
140 
141  if self._sourceDTypeList is None:
142  self._sourceDTypeList = [(key, arr.dtype)
143  for key, arr in itertools.izip(self._sourceKeyTuple, dataList)]
144 
145  transposedDataList = zip(*dataList)
146  del dataList
147 
148  dataDict.update((srcId, idTuple + tuple(data))
149  for srcId, data in itertools.izip(idList, transposedDataList))
150  return dataDict
def lsst.pipe.tasks.repositoryIterator.SourceData.addSourceMetrics (   self,
  repoInfo,
  idKeyTuple,
  idValList,
  sourceTableList 
)
Accumulate source measurements from a list of source tables.

Once you have accumulated all source measurements, call finalize to process the data.

@param[in] repoInfo: a RepositoryInfo instance
@param[in] idKeyTuple: a tuple of data ID keys; must be the same for each call
@param[in] idValList: a list of data ID value tuples;
    each tuple contains values in the order in idKeyTuple
@param[in] sourceTableList: a list of source tables, one per entry in idValList

@raise RuntimeError if idKeyTuple is different than it was for the first call.

Accumulates the data in temporary cache self._tempDataList.

@return number of sources

Definition at line 151 of file repositoryIterator.py.

152  def addSourceMetrics(self, repoInfo, idKeyTuple, idValList, sourceTableList):
153  """Accumulate source measurements from a list of source tables.
154 
155  Once you have accumulated all source measurements, call finalize to process the data.
156 
157  @param[in] repoInfo: a RepositoryInfo instance
158  @param[in] idKeyTuple: a tuple of data ID keys; must be the same for each call
159  @param[in] idValList: a list of data ID value tuples;
160  each tuple contains values in the order in idKeyTuple
161  @param[in] sourceTableList: a list of source tables, one per entry in idValList
162 
163  @raise RuntimeError if idKeyTuple is different than it was for the first call.
164 
165  Accumulates the data in temporary cache self._tempDataList.
166 
167  @return number of sources
168  """
169  if self._repoKeyTuple is None:
170  self._repoKeyTuple = repoInfo.keyTuple
171  self._repoDTypeList = repoInfo.dtype
172 
173  dataDict = self._getSourceMetrics(idKeyTuple, idValList, sourceTableList)
174 
175  self._tempDataList.append(dataDict)
176  self.repoInfoList.append(repoInfo)
177  return len(dataDict)
def lsst.pipe.tasks.repositoryIterator.SourceData.finalize (   self)
Process the accumulated source measurements to create the final data products.

Only call this after you have added all source metrics using addSourceMetrics.

Reads temporary cache self._tempDataList and then deletes it.

Definition at line 178 of file repositoryIterator.py.

179  def finalize(self):
180  """Process the accumulated source measurements to create the final data products.
181 
182  Only call this after you have added all source metrics using addSourceMetrics.
183 
184  Reads temporary cache self._tempDataList and then deletes it.
185  """
186  if len(self._tempDataList) == 0:
187  raise RuntimeError("No data found")
188 
189  fullSrcIdSet = set()
190  for dataIdDict in self._tempDataList:
191  fullSrcIdSet.update(dataIdDict.iterkeys())
192 
193  # source data
194  sourceArrDType = [("sourceId", int)] + self._idKeyDTypeList + self._sourceDTypeList
195  # data for missing sources (only for the data in the source data dict, so excludes srcId)
196  nullSourceTuple = tuple(numpy.zeros(1, dtype=self._idKeyDTypeList + self._sourceDTypeList)[0])
197 
198  sourceData = [[(srcId,) + srcDataDict.get(srcId, nullSourceTuple) for srcId in fullSrcIdSet]
199  for srcDataDict in self._tempDataList]
200 
201  self.sourceArr = numpy.array(sourceData, dtype=sourceArrDType)
202  del sourceData
204  self.sourceIdDict = dict((srcId, i) for i, srcId in enumerate(fullSrcIdSet))
205 
206  # repository data
207  repoData = [repoInfo.valTuple for repoInfo in self.repoInfoList]
208  self.repoArr = numpy.array(repoData, dtype=self._repoDTypeList)
209 
210  self._tempDataList = None
211 

Member Data Documentation

lsst.pipe.tasks.repositoryIterator.SourceData._idKeyDTypeList
private

Definition at line 86 of file repositoryIterator.py.

lsst.pipe.tasks.repositoryIterator.SourceData._idKeyTuple
private

Definition at line 85 of file repositoryIterator.py.

lsst.pipe.tasks.repositoryIterator.SourceData._repoDTypeList
private

Definition at line 91 of file repositoryIterator.py.

lsst.pipe.tasks.repositoryIterator.SourceData._repoKeyTuple
private

Definition at line 90 of file repositoryIterator.py.

lsst.pipe.tasks.repositoryIterator.SourceData._sourceDTypeList
private

Definition at line 88 of file repositoryIterator.py.

lsst.pipe.tasks.repositoryIterator.SourceData._sourceKeyTuple
private

Definition at line 83 of file repositoryIterator.py.

lsst.pipe.tasks.repositoryIterator.SourceData._tempDataList
private

Definition at line 94 of file repositoryIterator.py.

lsst.pipe.tasks.repositoryIterator.SourceData.datasetType

Definition at line 82 of file repositoryIterator.py.

lsst.pipe.tasks.repositoryIterator.SourceData.repoArr

Definition at line 207 of file repositoryIterator.py.

lsst.pipe.tasks.repositoryIterator.SourceData.repoInfoList

Definition at line 98 of file repositoryIterator.py.

lsst.pipe.tasks.repositoryIterator.SourceData.sourceArr

Definition at line 200 of file repositoryIterator.py.

lsst.pipe.tasks.repositoryIterator.SourceData.sourceIdDict

Definition at line 203 of file repositoryIterator.py.


The documentation for this class was generated from the following file: