LSST Applications 24.1.5,g02d81e74bb+fa3a7a026e,g180d380827+a53a32eff8,g2079a07aa2+86d27d4dc4,g2305ad1205+c0501b3732,g295015adf3+7d3e92f0ec,g2bbee38e9b+0e5473021a,g337abbeb29+0e5473021a,g33d1c0ed96+0e5473021a,g3a166c0a6a+0e5473021a,g3ddfee87b4+5dd1654d75,g48712c4677+3bf1020dcb,g487adcacf7+065c13d9cf,g50ff169b8f+96c6868917,g52b1c1532d+585e252eca,g591dd9f2cf+d7ac436cfb,g5a732f18d5+53520f316c,g64a986408d+fa3a7a026e,g858d7b2824+fa3a7a026e,g8a8a8dda67+585e252eca,g99cad8db69+a5a909b84f,g9ddcbc5298+9a081db1e4,ga1e77700b3+15fc3df1f7,ga8c6da7877+4cf350ccb2,gb0e22166c9+60f28cb32d,gba4ed39666+c2a2e4ac27,gbb8dafda3b+f991a0b59f,gc120e1dc64+9ccbfdb8be,gc28159a63d+0e5473021a,gcf0d15dbbd+5dd1654d75,gd96a1ce819+42fd0ee607,gdaeeff99f8+f9a426f77a,ge6526c86ff+0d71447b4b,ge79ae78c31+0e5473021a,gee10cc3b42+585e252eca,gff1a9f87cc+fa3a7a026e
LSST Data Management Base Package
Loading...
Searching...
No Matches
Public Member Functions | Public Attributes | Static Public Attributes | List of all members
lsst.meas.astrom.matcher_probabilistic.MatcherProbabilistic Class Reference

Public Member Functions

 __init__ (self, MatchProbabilisticConfig config)
 
 match (self, pd.DataFrame catalog_ref, pd.DataFrame catalog_target, np.array select_ref=None, np.array select_target=None, logging.Logger logger=None, int logging_n_rows=None, **kwargs)
 

Public Attributes

 config
 

Static Public Attributes

MatchProbabilisticConfig config
 

Detailed Description

A probabilistic, greedy catalog matcher.

Parameters
----------
config: `MatchProbabilisticConfig`
    A configuration instance.

Definition at line 414 of file matcher_probabilistic.py.

Constructor & Destructor Documentation

◆ __init__()

lsst.meas.astrom.matcher_probabilistic.MatcherProbabilistic.__init__ ( self,
MatchProbabilisticConfig config )

Definition at line 424 of file matcher_probabilistic.py.

427 ):
428 self.config = config
429

Member Function Documentation

◆ match()

lsst.meas.astrom.matcher_probabilistic.MatcherProbabilistic.match ( self,
pd.DataFrame catalog_ref,
pd.DataFrame catalog_target,
np.array select_ref = None,
np.array select_target = None,
logging.Logger logger = None,
int logging_n_rows = None,
** kwargs )
Match catalogs.

Parameters
----------
catalog_ref : `pandas.DataFrame`
    A reference catalog to match in order of a given column (i.e. greedily).
catalog_target : `pandas.DataFrame`
    A target catalog for matching sources from `catalog_ref`. Must contain measurements with errors.
select_ref : `numpy.array`
    A boolean array of the same length as `catalog_ref` selecting the sources that can be matched.
select_target : `numpy.array`
    A boolean array of the same length as `catalog_target` selecting the sources that can be matched.
logger : `logging.Logger`
    A Logger for logging.
logging_n_rows : `int`
    The number of sources to match before printing a log message.
kwargs
    Additional keyword arguments to pass to `format_catalogs`.

Returns
-------
catalog_out_ref : `pandas.DataFrame`
    A catalog of identical length to `catalog_ref`, containing match information for rows selected by
    `select_ref` (including the matching row index in `catalog_target`).
catalog_out_target : `pandas.DataFrame`
    A catalog of identical length to `catalog_target`, containing the indices of matching rows in
    `catalog_ref`.
exceptions : `dict` [`int`, `Exception`]
    A dictionary keyed by `catalog_target` row number of the first exception caught when matching.

Definition at line 430 of file matcher_probabilistic.py.

439 ):
440 """Match catalogs.
441
442 Parameters
443 ----------
444 catalog_ref : `pandas.DataFrame`
445 A reference catalog to match in order of a given column (i.e. greedily).
446 catalog_target : `pandas.DataFrame`
447 A target catalog for matching sources from `catalog_ref`. Must contain measurements with errors.
448 select_ref : `numpy.array`
449 A boolean array of the same length as `catalog_ref` selecting the sources that can be matched.
450 select_target : `numpy.array`
451 A boolean array of the same length as `catalog_target` selecting the sources that can be matched.
452 logger : `logging.Logger`
453 A Logger for logging.
454 logging_n_rows : `int`
455 The number of sources to match before printing a log message.
456 kwargs
457 Additional keyword arguments to pass to `format_catalogs`.
458
459 Returns
460 -------
461 catalog_out_ref : `pandas.DataFrame`
462 A catalog of identical length to `catalog_ref`, containing match information for rows selected by
463 `select_ref` (including the matching row index in `catalog_target`).
464 catalog_out_target : `pandas.DataFrame`
465 A catalog of identical length to `catalog_target`, containing the indices of matching rows in
466 `catalog_ref`.
467 exceptions : `dict` [`int`, `Exception`]
468 A dictionary keyed by `catalog_target` row number of the first exception caught when matching.
469 """
470 if logger is None:
471 logger = logger_default
472
473 config = self.config
474
475 # Transform any coordinates, if required
476 # Note: The returned objects contain the original catalogs, as well as
477 # transformed coordinates, and the selection of sources for matching.
478 # These might be identical to the arrays passed as kwargs, but that
479 # depends on config settings.
480 # For the rest of this function, the selection arrays will be used,
481 # but the indices of the original, unfiltered catalog will also be
482 # output, so some further indexing steps are needed.
483 ref, target = config.coord_format.format_catalogs(
484 catalog_ref=catalog_ref, catalog_target=catalog_target,
485 select_ref=select_ref, select_target=select_target,
486 **kwargs
487 )
488
489 # If no order is specified, take nansum of all flux columns for a 'total flux'
490 # Note: it won't actually be a total flux if bands overlap significantly
491 # (or it might define a filter with >100% efficiency
492 # Also, this is done on the original dataframe as it's harder to accomplish
493 # just with a recarray
494 column_order = (
495 catalog_ref.loc[ref.extras.select, config.column_ref_order]
496 if config.column_ref_order is not None else
497 np.nansum(catalog_ref.loc[ref.extras.select, config.columns_ref_flux], axis=1)
498 )
499 order = np.argsort(column_order if config.order_ascending else -column_order)
500
501 n_ref_select = len(ref.extras.indices)
502
503 match_dist_max = config.match_dist_max
504 coords_spherical = config.coord_format.coords_spherical
505 if coords_spherical:
506 match_dist_max = np.radians(match_dist_max / 3600.)
507
508 # Convert ra/dec sky coordinates to spherical vectors for accurate distances
509 func_convert = _radec_to_xyz if coords_spherical else np.vstack
510 vec_ref, vec_target = (
511 func_convert(cat.coord1[cat.extras.select], cat.coord2[cat.extras.select])
512 for cat in (ref, target)
513 )
514
515 # Generate K-d tree to compute distances
516 logger.info('Generating cKDTree with match_n_max=%d', config.match_n_max)
517 tree_obj = cKDTree(vec_target)
518
519 scores, idxs_target_select = tree_obj.query(
520 vec_ref,
521 distance_upper_bound=match_dist_max,
522 k=config.match_n_max,
523 )
524
525 n_target_select = len(target.extras.indices)
526 n_matches = np.sum(idxs_target_select != n_target_select, axis=1)
527 n_matched_max = np.sum(n_matches == config.match_n_max)
528 if n_matched_max > 0:
529 logger.warning(
530 '%d/%d (%.2f%%) selected true objects have n_matches=n_match_max(%d)',
531 n_matched_max, n_ref_select, 100.*n_matched_max/n_ref_select, config.match_n_max
532 )
533
534 # Pre-allocate outputs
535 target_row_match = np.full(target.extras.n, np.nan, dtype=np.int64)
536 ref_candidate_match = np.zeros(ref.extras.n, dtype=bool)
537 ref_row_match = np.full(ref.extras.n, np.nan, dtype=np.int64)
538 ref_match_count = np.zeros(ref.extras.n, dtype=np.int32)
539 ref_match_meas_finite = np.zeros(ref.extras.n, dtype=np.int32)
540 ref_chisq = np.full(ref.extras.n, np.nan, dtype=float)
541
542 # Need the original reference row indices for output
543 idx_orig_ref, idx_orig_target = (np.argwhere(cat.extras.select) for cat in (ref, target))
544
545 # Retrieve required columns, including any converted ones (default to original column name)
546 columns_convert = config.coord_format.coords_ref_to_convert
547 if columns_convert is None:
548 columns_convert = {}
549 data_ref = ref.catalog[
550 [columns_convert.get(column, column) for column in config.columns_ref_meas]
551 ].iloc[ref.extras.indices[order]]
552 data_target = target.catalog[config.columns_target_meas][target.extras.select]
553 errors_target = target.catalog[config.columns_target_err][target.extras.select]
554
555 exceptions = {}
556 # The kdTree uses len(inputs) as a sentinel value for no match
557 matched_target = {n_target_select, }
558
559 t_begin = time.process_time()
560
561 logger.info('Matching n_indices=%d/%d', len(order), len(ref.catalog))
562 for index_n, index_row_select in enumerate(order):
563 index_row = idx_orig_ref[index_row_select]
564 ref_candidate_match[index_row] = True
565 found = idxs_target_select[index_row_select, :]
566 # Select match candidates from nearby sources not already matched
567 # Note: set lookup is apparently fast enough that this is a few percent faster than:
568 # found = [x for x in found[found != n_target_select] if x not in matched_target]
569 # ... at least for ~1M sources
570 found = [x for x in found if x not in matched_target]
571 n_found = len(found)
572 if n_found > 0:
573 # This is an ndarray of n_found rows x len(data_ref/target) columns
574 chi = (
575 (data_target.iloc[found].values - data_ref.iloc[index_n].values)
576 / errors_target.iloc[found].values
577 )
578 finite = np.isfinite(chi)
579 n_finite = np.sum(finite, axis=1)
580 # Require some number of finite chi_sq to match
581 chisq_good = n_finite >= config.match_n_finite_min
582 if np.any(chisq_good):
583 try:
584 chisq_sum = np.zeros(n_found, dtype=float)
585 chisq_sum[chisq_good] = np.nansum(chi[chisq_good, :] ** 2, axis=1)
586 idx_chisq_min = np.nanargmin(chisq_sum / n_finite)
587 ref_match_meas_finite[index_row] = n_finite[idx_chisq_min]
588 ref_match_count[index_row] = len(chisq_good)
589 ref_chisq[index_row] = chisq_sum[idx_chisq_min]
590 idx_match_select = found[idx_chisq_min]
591 row_target = target.extras.indices[idx_match_select]
592 ref_row_match[index_row] = row_target
593
594 target_row_match[row_target] = index_row
595 matched_target.add(idx_match_select)
596 except Exception as error:
597 # Can't foresee any exceptions, but they shouldn't prevent
598 # matching subsequent sources
599 exceptions[index_row] = error
600
601 if logging_n_rows and ((index_n + 1) % logging_n_rows == 0):
602 t_elapsed = time.process_time() - t_begin
603 logger.info(
604 'Processed %d/%d in %.2fs at sort value=%.3f',
605 index_n + 1, n_ref_select, t_elapsed, column_order[order[index_n]],
606 )
607
608 data_ref = {
609 'match_candidate': ref_candidate_match,
610 'match_row': ref_row_match,
611 'match_count': ref_match_count,
612 'match_chisq': ref_chisq,
613 'match_n_chisq_finite': ref_match_meas_finite,
614 }
615 data_target = {
616 'match_candidate': target.extras.select if target.extras.select is not None else (
617 np.ones(target.extras.n, dtype=bool)),
618 'match_row': target_row_match,
619 }
620
621 for (columns, out_original, out_matched, in_original, in_matched, matches) in (
622 (
623 self.config.columns_ref_copy,
624 data_ref,
625 data_target,
626 ref,
627 target,
628 target_row_match,
629 ),
630 (
631 self.config.columns_target_copy,
632 data_target,
633 data_ref,
634 target,
635 ref,
636 ref_row_match,
637 ),
638 ):
639 matched = matches >= 0
640 idx_matched = matches[matched]
641
642 for column in columns:
643 values = in_original.catalog[column]
644 out_original[column] = values
645 dtype = in_original.catalog[column].dtype
646
647 # Pandas object columns can have mixed types - check for that
648 if dtype == object:
649 types = list(set((type(x) for x in values)))
650 if len(types) != 1:
651 raise RuntimeError(f'Column {column} dtype={dtype} has multiple types={types}')
652 dtype = types[0]
653
654 value_fill = default_value(dtype)
655
656 # Without this, the dtype would be '<U1' for an empty Unicode string
657 if dtype == str:
658 dtype = f'<U{max(len(x) for x in values)}'
659
660 column_match = np.full(in_matched.extras.n, value_fill, dtype=dtype)
661 column_match[matched] = in_original.catalog[column][idx_matched]
662 out_matched[f'match_{column}'] = column_match
663
664 catalog_out_ref = pd.DataFrame(data_ref)
665 catalog_out_target = pd.DataFrame(data_target)
666
667 return catalog_out_ref, catalog_out_target, exceptions
daf::base::PropertySet * set
Definition fits.cc:931

Member Data Documentation

◆ config [1/2]

MatchProbabilisticConfig lsst.meas.astrom.matcher_probabilistic.MatcherProbabilistic.config
static

Definition at line 422 of file matcher_probabilistic.py.

◆ config [2/2]

lsst.meas.astrom.matcher_probabilistic.MatcherProbabilistic.config

Definition at line 428 of file matcher_probabilistic.py.


The documentation for this class was generated from the following file: