LSSTApplications  11.0-24-g0a022a1,14.0+77,15.0,15.0+1
LSSTDataManagementBasePackage
Public Member Functions | Private Attributes | List of all members
lsst.datarel.datasetScanner.HfsScanner Class Reference
Inheritance diagram for lsst.datarel.datasetScanner.HfsScanner:
lsst.datarel.datasetScanner.DatasetScanner

Public Member Functions

def __init__ (self, template)
 
def walk (self, root, rules=None)
 

Private Attributes

 _formatKeys
 
 _pathComponents
 

Detailed Description

A hierarchical scanner for paths matching a template, optionally
also restricting visited paths to those matching a list of dataId rules.

Definition at line 211 of file datasetScanner.py.

Constructor & Destructor Documentation

◆ __init__()

def lsst.datarel.datasetScanner.HfsScanner.__init__ (   self,
  template 
)
Build an FsScanner for given a path template. The path template
should be a Python string with named format substitution
specifications, as used in mapper policy files. For example:

deepCoadd-results/%(filter)s/%(tract)d/%(patch)s/calexp-%(filter)s-%(tract)d-%(patch)s.fits

Note that a key may appear multiple times. If it does,
the value for each occurrence should be identical (the formatting
specs must be identical). Octal, binary, hexadecimal, and floating
point formats are not supported.

Definition at line 216 of file datasetScanner.py.

216  def __init__(self, template):
217  """Build an FsScanner for given a path template. The path template
218  should be a Python string with named format substitution
219  specifications, as used in mapper policy files. For example:
220 
221  deepCoadd-results/%(filter)s/%(tract)d/%(patch)s/calexp-%(filter)s-%(tract)d-%(patch)s.fits
222 
223  Note that a key may appear multiple times. If it does,
224  the value for each occurrence should be identical (the formatting
225  specs must be identical). Octal, binary, hexadecimal, and floating
226  point formats are not supported.
227  """
228  template = os.path.normpath(template)
229  if (len(template) == 0 or
230  template == os.curdir or
231  template[0] == os.sep or
232  template[-1] == os.sep):
233  raise RuntimeError(
234  'Path template is empty, absolute, or identifies a directory')
235  self._formatKeys = {}
236  self._pathComponents = []
237  fmt = re.compile(r'%\((\w+)\).*?([diucrs])')
238 
239  # split path into components
240  for component in template.split(os.sep):
241  # search for all occurences of a format spec
242  simple = True
243  last = 0
244  regex = ''
245  newKeys = []
246  for m in fmt.finditer(component):
247  simple = False
248  spec = m.group(0)
249  k = m.group(1)
250  seenBefore = k in self._formatKeys
251  # transform format spec into a regular expression
252  regex += re.escape(component[last:m.start(0)])
253  last = m.end(0)
254  regex += '('
255  if seenBefore:
256  regex += '?:'
257  if m.group(2) in 'crs':
258  munge = _mungeStr
259  typ = str
260  regex += r'.+)'
261  else:
262  munge = _mungeInt
263  typ = int
264  regex += r'[+-]?\d+)'
265  if seenBefore:
266  # check consistency of formatting spec across key occurences
267  if spec[-1] != self._formatKeys[k].spec[-1]:
268  raise RuntimeError(
269  'Path template contains inconsistent format type-codes '
270  'for the same key')
271  else:
272  newKeys.append(k)
273  self._formatKeys[k] = _FormatKey(spec, typ, munge)
274  regex += re.escape(component[last:])
275  if simple:
276  regex = component # literal match
277  else:
278  regex = re.compile('^' + regex + '$')
279  self._pathComponents.append(_PathComponent(newKeys, regex, simple))
280 
std::shared_ptr< FrameSet > append(FrameSet const &first, FrameSet const &second)
Construct a FrameSet that performs two transformations in series.
Definition: functional.cc:33
def __init__(self, needLockOnRead=True, data=None, cond=None)
Definition: SharedData.py:53

Member Function Documentation

◆ walk()

def lsst.datarel.datasetScanner.HfsScanner.walk (   self,
  root,
  rules = None 
)
Generator that descends the given root directory in top-down
fashion, matching paths corresponding to the template and satisfying
the given rule list. The generator yields tuples of the form
(path, dataId), where path is a dataset file name relative to root,
and dataId is a key value dictionary identifying the file.

Definition at line 281 of file datasetScanner.py.

281  def walk(self, root, rules=None):
282  """Generator that descends the given root directory in top-down
283  fashion, matching paths corresponding to the template and satisfying
284  the given rule list. The generator yields tuples of the form
285  (path, dataId), where path is a dataset file name relative to root,
286  and dataId is a key value dictionary identifying the file.
287  """
288  oneFound = False
289  while os.path.exists(root) and not oneFound:
290  stack = [(0, root, rules, {})]
291  while stack:
292  depth, path, rules, dataId = stack.pop()
293  if os.path.isfile(path):
294  continue
295  pc = self._pathComponents[depth]
296  if pc.simple:
297  # No need to list directory contents
298  entries = [pc.regex]
299  if not os.path.exists(os.path.join(path, pc.regex)):
300  continue
301  else:
302  entries = os.listdir(path)
303  depth += 1
304  for e in entries:
305  subRules = rules
306  subDataId = dataId
307  if not pc.simple:
308  # make sure e matches path component regular expression
309  m = pc.regex.match(e)
310  if not m:
311  continue
312  # got a match - update dataId with new key values (if any)
313  try:
314  for i, k in enumerate(pc.keys):
315  subDataId = self._formatKeys[k].munge(k, m.group(i + 1), subDataId)
316  except:
317  # Munger raises if value is invalid for key, so
318  # not really a match
319  continue
320  if subRules and pc.keys:
321  # have dataId rules and saw new keys; filter rule list
322  for k in subDataId:
323  newRules = []
324  for r in subRules:
325  if k not in r or subDataId[k] in r[k]:
326  newRules.append(r)
327  subRules = newRules
328  if not subRules:
329  continue # no rules matched
330  # Have path matching template and at least one rule
331  p = os.path.join(path, e)
332  if depth < len(self._pathComponents):
333  # recurse
334  stack.append((depth, p, subRules, subDataId))
335  elif depth == len(self._pathComponents):
336  if os.path.isfile(p):
337  # found a matching file, yield it
338  yield os.path.relpath(p, root), subDataId
339  oneFound = True
340  # end while stack
341  root = os.path.join(root, "_parent")
342 
343 
344 # -- Camera specific dataId mungers ----
345 

Member Data Documentation

◆ _formatKeys

lsst.datarel.datasetScanner.HfsScanner._formatKeys
private

Definition at line 235 of file datasetScanner.py.

◆ _pathComponents

lsst.datarel.datasetScanner.HfsScanner._pathComponents
private

Definition at line 236 of file datasetScanner.py.


The documentation for this class was generated from the following file: