Inheritance diagram for lsst.pipe.tasks.diff_matched_tract_catalog.DiffMatchedTractCatalogTask:

Public Member Functions
	runQuantum (self, butlerQC, inputRefs, outputRefs)

pipeBase.Struct	run (self, pd.DataFrame catalog_ref, pd.DataFrame catalog_target, pd.DataFrame catalog_match_ref, pd.DataFrame catalog_match_target, afwGeom.SkyWcs wcs=None)

Static Public Attributes
	ConfigClass = DiffMatchedTractCatalogConfig

Static Protected Attributes
str	_DefaultName = "DiffMatchedTractCatalog"

Detailed Description

Load subsets of matched catalogs and output a merged catalog of matched sources.

Definition at line 557 of file diff_matched_tract_catalog.py.

Member Function Documentation

◆ run()

pipeBase.Struct lsst.pipe.tasks.diff_matched_tract_catalog.DiffMatchedTractCatalogTask.run	(		self,
		pd.DataFrame	catalog_ref,
		pd.DataFrame	catalog_target,
		pd.DataFrame	catalog_match_ref,
		pd.DataFrame	catalog_match_target,
		afwGeom.SkyWcs	wcs = None )

Load matched reference and target (measured) catalogs, measure summary statistics, and output
a combined matched catalog with columns from both inputs.

Parameters
----------
catalog_ref : `pandas.DataFrame`
    A reference catalog to diff objects/sources from.
catalog_target : `pandas.DataFrame`
    A target catalog to diff reference objects/sources to.
catalog_match_ref : `pandas.DataFrame`
    A catalog with match indices of target sources and selection flags
    for each reference source.
catalog_match_target : `pandas.DataFrame`
    A catalog with selection flags for each target source.
wcs : `lsst.afw.image.SkyWcs`
    A coordinate system to convert catalog positions to sky coordinates,
    if necessary.

Returns
-------
retStruct : `lsst.pipe.base.Struct`
    A struct with output_ref and output_target attribute containing the
    output matched catalogs.

Definition at line 584 of file diff_matched_tract_catalog.py.

    ) -> pipeBase.Struct:
        """Load matched reference and target (measured) catalogs, measure summary statistics, and output
        a combined matched catalog with columns from both inputs.
 
        Parameters
        ----------
        catalog_ref : `pandas.DataFrame`
            A reference catalog to diff objects/sources from.
        catalog_target : `pandas.DataFrame`
            A target catalog to diff reference objects/sources to.
        catalog_match_ref : `pandas.DataFrame`
            A catalog with match indices of target sources and selection flags
            for each reference source.
        catalog_match_target : `pandas.DataFrame`
            A catalog with selection flags for each target source.
        wcs : `lsst.afw.image.SkyWcs`
            A coordinate system to convert catalog positions to sky coordinates,
            if necessary.
 
        Returns
        -------
        retStruct : `lsst.pipe.base.Struct`
            A struct with output_ref and output_target attribute containing the
            output matched catalogs.
        """
        config = self.config
 
        select_ref = catalog_match_ref['match_candidate'].values
        # Add additional selection criteria for target sources beyond those for matching
        # (not recommended, but can be done anyway)
        select_target = (catalog_match_target['match_candidate'].values
                         if 'match_candidate' in catalog_match_target.columns
                         else np.ones(len(catalog_match_target), dtype=bool))
        for column in config.columns_target_select_true:
            select_target &= catalog_target[column].values
        for column in config.columns_target_select_false:
            select_target &= ~catalog_target[column].values
 
        ref, target = config.coord_format.format_catalogs(
            catalog_ref=catalog_ref, catalog_target=catalog_target,
            select_ref=None, select_target=select_target, wcs=wcs, radec_to_xy_func=radec_to_xy,
            return_converted_columns=config.coord_format.coords_ref_to_convert is not None,
        )
        cat_ref = ref.catalog
        cat_target = target.catalog
        n_target = len(cat_target)
 
        match_row = catalog_match_ref['match_row'].values
        matched_ref = match_row >= 0
        matched_row = match_row[matched_ref]
        matched_target = np.zeros(n_target, dtype=bool)
        matched_target[matched_row] = True
 
        # Create a matched table, preserving the target catalog's named index (if it has one)
        cat_left = cat_target.iloc[matched_row]
        has_index_left = cat_left.index.name is not None
        cat_right = cat_ref[matched_ref].reset_index()
        cat_matched = pd.concat(objs=(cat_left.reset_index(drop=True), cat_right), axis=1, sort=False)
        if has_index_left:
            cat_matched.index = cat_left.index
        cat_matched.columns.values[len(cat_target.columns):] = [f'refcat_{col}' for col in cat_right.columns]
 
        # Add/compute distance columns
        coord1_target_err, coord2_target_err = config.columns_target_coord_err
        column_dist, column_dist_err = 'distance', 'distanceErr'
        dist = np.full(n_target, np.Inf)
 
        dist[matched_row] = np.hypot(
            target.coord1[matched_row] - ref.coord1[matched_ref],
            target.coord2[matched_row] - ref.coord2[matched_ref],
        )
        dist_err = np.full(n_target, np.Inf)
        dist_err[matched_row] = np.hypot(cat_target.iloc[matched_row][coord1_target_err].values,
                                         cat_target.iloc[matched_row][coord2_target_err].values)
        cat_target[column_dist], cat_target[column_dist_err] = dist, dist_err
 
        # Slightly smelly hack for when a column (like distance) is already relative to truth
        column_dummy = 'dummy'
        cat_ref[column_dummy] = np.zeros_like(ref.coord1)
 
        # Add a boolean column for whether a match is classified correctly
        extended_ref = cat_ref[config.column_ref_extended]
        if config.column_ref_extended_inverted:
            extended_ref = 1 - extended_ref
 
        extended_target = cat_target[config.column_target_extended].values >= config.extendedness_cut
 
        # Define difference/chi columns and statistics thereof
        suffixes = {MeasurementType.DIFF: 'diff', MeasurementType.CHI: 'chi'}
        # Skip diff for fluxes - covered by mags
        suffixes_flux = {MeasurementType.CHI: suffixes[MeasurementType.CHI]}
        # Skip chi for magnitudes, which have strange errors
        suffixes_mag = {MeasurementType.DIFF: suffixes[MeasurementType.DIFF]}
        stats = {stat.name_short(): stat() for stat in (Median, SigmaIQR, SigmaMAD)}
 
        for percentile in self.config.percentiles:
            stat = Percentile(percentile=float(Decimal(percentile)))
            stats[stat.name_short()] = stat
 
        # Get dict of column names
        columns, n_models = _get_columns(
            bands_columns=config.columns_flux,
            suffixes=suffixes,
            suffixes_flux=suffixes_flux,
            suffixes_mag=suffixes_mag,
            stats=stats,
            target=target,
            column_dist=column_dist,
        )
 
        # Setup numpy table
        n_bins = config.mag_num_bins
        data = np.zeros((n_bins,), dtype=[(key, value) for key, value in columns.items()])
        data['bin'] = np.arange(n_bins)
 
        # Setup bins
        bins_mag = np.linspace(start=config.mag_brightest_ref, stop=config.mag_faintest_ref,
                               num=n_bins + 1)
        data['mag_min'] = bins_mag[:-1]
        data['mag_max'] = bins_mag[1:]
        bins_mag = tuple((bins_mag[idx], bins_mag[idx + 1]) for idx in range(n_bins))
 
        # Define temporary columns for intermediate storage
        column_mag_temp = 'mag_temp'
        column_color_temp = 'color_temp'
        column_color_err_temp = 'colorErr_temp'
        flux_err_frac_prev = [None]*n_models
        mag_prev = [None]*n_models
 
        columns_target = {
            target.column_coord1: (
                ref.column_coord1, target.column_coord1, coord1_target_err, False,
            ),
            target.column_coord2: (
                ref.column_coord2, target.column_coord2, coord2_target_err, False,
            ),
            column_dist: (column_dummy, column_dist, column_dist_err, False),
        }
 
        # Cheat a little and do the first band last so that the color is
        # based on the last band
        band_fluxes = [(band, config_flux) for (band, config_flux) in config.columns_flux.items()]
        n_bands = len(band_fluxes)
        band_fluxes.append(band_fluxes[0])
        flux_err_frac_first = None
        mag_first = None
        mag_ref_first = None
 
        band_prev = None
        for idx_band, (band, config_flux) in enumerate(band_fluxes):
            if idx_band == n_bands:
                # These were already computed earlier
                mag_ref = mag_ref_first
                flux_err_frac = flux_err_frac_first
                mag_model = mag_first
            else:
                mag_ref = -2.5*np.log10(cat_ref[config_flux.column_ref_flux]) + config.mag_zeropoint_ref
                flux_err_frac = [None]*n_models
                mag_model = [None]*n_models
 
                if idx_band > 0:
                    cat_ref[column_color_temp] = cat_ref[column_mag_temp] - mag_ref
 
            cat_ref[column_mag_temp] = mag_ref
 
            select_ref_bins = [select_ref & (mag_ref > mag_lo) & (mag_ref < mag_hi)
                               for idx_bin, (mag_lo, mag_hi) in enumerate(bins_mag)]
 
            # Iterate over multiple models, compute their mags and colours (if there's a previous band)
            for idx_model in range(n_models):
                column_target_flux = config_flux.columns_target_flux[idx_model]
                column_target_flux_err = config_flux.columns_target_flux_err[idx_model]
 
                flux_target = cat_target[column_target_flux]
                mag_target = -2.5*np.log10(flux_target) + config.mag_zeropoint_target
                if config.mag_ceiling_target is not None:
                    mag_target[mag_target > config.mag_ceiling_target] = config.mag_ceiling_target
                mag_model[idx_model] = mag_target
 
                # These are needed for computing magnitude/color "errors" (which are a sketchy concept)
                flux_err_frac[idx_model] = cat_target[column_target_flux_err]/flux_target
 
                # Stop if idx == 0: The rest will be picked up at idx == n_bins
                if idx_band > 0:
                    # Keep these mags tabulated for convenience
                    column_mag_temp_model = f'{column_mag_temp}{idx_model}'
                    cat_target[column_mag_temp_model] = mag_target
 
                    columns_target[f'flux_{column_target_flux}'] = (
                        config_flux.column_ref_flux,
                        column_target_flux,
                        column_target_flux_err,
                        True,
                    )
                    # Note: magnitude errors are generally problematic and not worth aggregating
                    columns_target[f'mag_{column_target_flux}'] = (
                        column_mag_temp, column_mag_temp_model, None, False,
                    )
 
                    # No need for colors if this is the last band and there are only two bands
                    # (because it would just be the negative of the first color)
                    skip_color = (idx_band == n_bands) and (n_bands <= 2)
                    if not skip_color:
                        column_color_temp_model = f'{column_color_temp}{idx_model}'
                        column_color_err_temp_model = f'{column_color_err_temp}{idx_model}'
 
                        # e.g. if order is ugrizy, first color will be u - g
                        cat_target[column_color_temp_model] = mag_prev[idx_model] - mag_model[idx_model]
 
                        # Sum (in quadrature, and admittedly sketchy for faint fluxes) magnitude errors
                        cat_target[column_color_err_temp_model] = 2.5/np.log(10)*np.hypot(
                            flux_err_frac[idx_model], flux_err_frac_prev[idx_model])
                        columns_target[f'color_{band_prev}_m_{band}_{column_target_flux}'] = (
                            column_color_temp,
                            column_color_temp_model,
                            column_color_err_temp_model,
                            False,
                        )
 
                    for idx_bin, (mag_lo, mag_hi) in enumerate(bins_mag):
                        row = data[idx_bin]
                        # Reference sources only need to be counted once
                        if idx_model == 0:
                            select_ref_bin = select_ref_bins[idx_bin]
                        select_target_bin = select_target & (mag_target > mag_lo) & (mag_target < mag_hi)
 
                        for sourcetype in SourceType:
                            sourcetype_info = sourcetype.value
                            is_extended = sourcetype_info.is_extended
                            # Counts filtered by match selection and magnitude bin
                            select_ref_sub = select_ref_bin.copy()
                            select_target_sub = select_target_bin.copy()
                            if is_extended is not None:
                                is_extended_ref = (extended_ref == is_extended)
                                select_ref_sub &= is_extended_ref
                                if idx_model == 0:
                                    n_ref_sub = np.count_nonzero(select_ref_sub)
                                    row[_get_column_name(band, sourcetype_info.label, 'n_ref',
                                                         MatchType.ALL.value)] = n_ref_sub
                                select_target_sub &= (extended_target == is_extended)
                                n_target_sub = np.count_nonzero(select_target_sub)
                                row[_get_column_name(band, sourcetype_info.label, 'n_target',
                                                     MatchType.ALL.value)] = n_target_sub
 
                            # Filter matches by magnitude bin and true class
                            match_row_bin = match_row.copy()
                            match_row_bin[~select_ref_sub] = -1
                            match_good = match_row_bin >= 0
 
                            n_match = np.count_nonzero(match_good)
 
                            # Same for counts of matched target sources (for e.g. purity)
 
                            if n_match > 0:
                                rows_matched = match_row_bin[match_good]
                                subset_target = cat_target.iloc[rows_matched]
                                if (is_extended is not None) and (idx_model == 0):
                                    right_type = extended_target[rows_matched] == is_extended
                                    n_total = len(right_type)
                                    n_right = np.count_nonzero(right_type)
                                    row[_get_column_name(band, sourcetype_info.label, 'n_ref',
                                                         MatchType.MATCH_RIGHT.value)] = n_right
                                    row[_get_column_name(
                                        band, sourcetype_info.label, 'n_ref', MatchType.MATCH_WRONG.value,
                                    )] = n_total - n_right
 
                                # compute stats for this bin, for all columns
                                for column, (column_ref, column_target, column_err_target, skip_diff) \
                                        in columns_target.items():
                                    values_ref = cat_ref[column_ref][match_good].values
                                    errors_target = (
                                        subset_target[column_err_target].values
                                        if column_err_target is not None
                                        else None
                                    )
                                    compute_stats(
                                        values_ref,
                                        subset_target[column_target].values,
                                        errors_target,
                                        row,
                                        stats,
                                        suffixes,
                                        prefix=f'{band}_{sourcetype_info.label}_{column}',
                                        skip_diff=skip_diff,
                                    )
 
                            # Count matched target sources with *measured* mags within bin
                            # Used for e.g. purity calculation
                            # Should be merged with above code if there's ever a need for
                            # measuring stats on this source selection
                            select_target_sub &= matched_target
 
                            if is_extended is not None and (np.count_nonzero(select_target_sub) > 0):
                                n_total = np.count_nonzero(select_target_sub)
                                right_type = np.zeros(n_target, dtype=bool)
                                right_type[match_row[matched_ref & is_extended_ref]] = True
                                right_type &= select_target_sub
                                n_right = np.count_nonzero(right_type)
                                row[_get_column_name(band, sourcetype_info.label, 'n_target',
                                                     MatchType.MATCH_RIGHT.value)] = n_right
                                row[_get_column_name(band, sourcetype_info.label, 'n_target',
                                                     MatchType.MATCH_WRONG.value)] = n_total - n_right
 
                    # delete the flux/color columns since they change with each band
                    for prefix in ('flux', 'mag'):
                        del columns_target[f'{prefix}_{column_target_flux}']
                    if not skip_color:
                        del columns_target[f'color_{band_prev}_m_{band}_{column_target_flux}']
 
            # keep values needed for colors
            flux_err_frac_prev = flux_err_frac
            mag_prev = mag_model
            band_prev = band
            if idx_band == 0:
                flux_err_frac_first = flux_err_frac
                mag_first = mag_model
                mag_ref_first = mag_ref
 
        retStruct = pipeBase.Struct(cat_matched=cat_matched, diff_matched=pd.DataFrame(data))
        return retStruct

◆ runQuantum()

lsst.pipe.tasks.diff_matched_tract_catalog.DiffMatchedTractCatalogTask.runQuantum	(	self,
		butlerQC,
		inputRefs,
		outputRefs )

Definition at line 563 of file diff_matched_tract_catalog.py.

    def runQuantum(self, butlerQC, inputRefs, outputRefs):
        inputs = butlerQC.get(inputRefs)
        skymap = inputs.pop("skymap")
 
        columns_match_target = ['match_row']
        if 'match_candidate' in inputs['columns_match_target']:
            columns_match_target.append('match_candidate')
 
        outputs = self.run(
            catalog_ref=inputs['cat_ref'].get(parameters={'columns': self.config.columns_in_ref}),
            catalog_target=inputs['cat_target'].get(parameters={'columns': self.config.columns_in_target}),
            catalog_match_ref=inputs['cat_match_ref'].get(
                parameters={'columns': ['match_candidate', 'match_row']},
            ),
            catalog_match_target=inputs['cat_match_target'].get(
                parameters={'columns': columns_match_target},
            ),
            wcs=skymap[butlerQC.quantum.dataId["tract"]].wcs,
        )
        butlerQC.put(outputs, outputRefs)
 

Member Data Documentation

◆ _DefaultName

str lsst.pipe.tasks.diff_matched_tract_catalog.DiffMatchedTractCatalogTask._DefaultName = "DiffMatchedTractCatalog"

staticprotected

Definition at line 561 of file diff_matched_tract_catalog.py.

◆ ConfigClass

lsst.pipe.tasks.diff_matched_tract_catalog.DiffMatchedTractCatalogTask.ConfigClass = DiffMatchedTractCatalogConfig

static

Definition at line 560 of file diff_matched_tract_catalog.py.

The documentation for this class was generated from the following file:

/j/snowflake/release/lsstsw/stack/lsst-scipipe-8.0.0/Linux64/pipe_tasks/g14a832a312+311607e4ab/python/lsst/pipe/tasks/diff_matched_tract_catalog.py

Public Member Functions

Static Public Attributes