Classes
class	SessionWrapper

Functions
pandas.DataFrame	pandas_dataframe_factory (List[str] colnames, List[Tuple] rows)

Tuple[List[str], List[Tuple]]	raw_data_factory (List[str] colnames, List[Tuple] rows)

Union[pandas.DataFrame, List]	select_concurrent (Session session, List[Tuple] statements, str execution_profile, int concurrency)

Any	literal (Any v)

str	quote_id (str columnName)

Variables
bool	CASSANDRA_IMPORTED = True

Function Documentation

◆ literal()

Any lsst.dax.apdb.cassandra_utils.literal ( Any v )

Transform object into a value for the query.

Definition at line 245 of file cassandra_utils.py.

def literal(v: Any) -> Any:
    """Transform object into a value for the query."""
    if v is None:
        pass
    elif isinstance(v, datetime):
        v = int((v - datetime(1970, 1, 1)) / timedelta(seconds=1)) * 1000
    elif isinstance(v, (bytes, str)):
        pass
    else:
        try:
            if not np.isfinite(v):
                v = None
        except TypeError:
            pass
    return v
 
 

◆ pandas_dataframe_factory()

pandas.DataFrame lsst.dax.apdb.cassandra_utils.pandas_dataframe_factory	(	List[str]	colnames,
		List[Tuple]	rows
	)

Special non-standard row factory that creates pandas DataFrame from
Cassandra result set.

Parameters
----------
colnames : `list` [ `str` ]
    Names of the columns.
rows : `list` of `tuple`
    Result rows.

Returns
-------
catalog : `pandas.DataFrame`
    DataFrame with the result set.

Notes
-----
When using this method as row factory for Cassandra, the resulting
DataFrame should be accessed in a non-standard way using
`ResultSet._current_rows` attribute.

Definition at line 89 of file cassandra_utils.py.

) -> pandas.DataFrame:
    """Special non-standard row factory that creates pandas DataFrame from
    Cassandra result set.
 
    Parameters
    ----------
    colnames : `list` [ `str` ]
        Names of the columns.
    rows : `list` of `tuple`
        Result rows.
 
    Returns
    -------
    catalog : `pandas.DataFrame`
        DataFrame with the result set.
 
    Notes
    -----
    When using this method as row factory for Cassandra, the resulting
    DataFrame should be accessed in a non-standard way using
    `ResultSet._current_rows` attribute.
    """
    return pandas.DataFrame.from_records(rows, columns=colnames)
 
 

◆ quote_id()

str lsst.dax.apdb.cassandra_utils.quote_id ( str columnName )

Smart quoting for column names. Lower-case names are not quoted.

Definition at line 262 of file cassandra_utils.py.

def quote_id(columnName: str) -> str:
    """Smart quoting for column names. Lower-case names are not quoted."""
    if not columnName.islower():
        columnName = '"' + columnName + '"'
    return columnName

◆ raw_data_factory()

Tuple[List[str], List[Tuple]] lsst.dax.apdb.cassandra_utils.raw_data_factory	(	List[str]	colnames,
		List[Tuple]	rows
	)

Special non-standard row factory that makes 2-element tuple containing
unmodified data: list of column names and list of rows.

Parameters
----------
colnames : `list` [ `str` ]
    Names of the columns.
rows : `list` of `tuple`
    Result rows.

Returns
-------
colnames : `list` [ `str` ]
    Names of the columns.
rows : `list` of `tuple`
    Result rows

Notes
-----
When using this method as row factory for Cassandra, the resulting
2-element tuple should be accessed in a non-standard way using
`ResultSet._current_rows` attribute. This factory is used to build
pandas DataFrames in `select_concurrent` method.

Definition at line 116 of file cassandra_utils.py.

) -> Tuple[List[str], List[Tuple]]:
    """Special non-standard row factory that makes 2-element tuple containing
    unmodified data: list of column names and list of rows.
 
    Parameters
    ----------
    colnames : `list` [ `str` ]
        Names of the columns.
    rows : `list` of `tuple`
        Result rows.
 
    Returns
    -------
    colnames : `list` [ `str` ]
        Names of the columns.
    rows : `list` of `tuple`
        Result rows
 
    Notes
    -----
    When using this method as row factory for Cassandra, the resulting
    2-element tuple should be accessed in a non-standard way using
    `ResultSet._current_rows` attribute. This factory is used to build
    pandas DataFrames in `select_concurrent` method.
    """
    return (colnames, rows)
 
 

◆ select_concurrent()

Union[pandas.DataFrame, List] lsst.dax.apdb.cassandra_utils.select_concurrent	(	Session	session,
		List[Tuple]	statements,
		str	execution_profile,
		int	concurrency
	)

Execute bunch of queries concurrently and merge their results into
a single result.

Parameters
----------
statements : `list` [ `tuple` ]
    List of statements and their parameters, passed directly to
    ``execute_concurrent()``.
execution_profile : `str`
    Execution profile name.

Returns
-------
result
    Combined result of multiple statements, type of the result depends on
    specific row factory defined in execution profile. If row factory is
    one of `pandas_dataframe_factory` or `raw_data_factory` then pandas
    DataFrame is created from a combined result. Otherwise a list of
    rows is returned, type of each row is determined by the row factory.

Notes
-----
This method can raise any exception that is raised by one of the provided
statements.

Definition at line 146 of file cassandra_utils.py.

) -> Union[pandas.DataFrame, List]:
    """Execute bunch of queries concurrently and merge their results into
    a single result.
 
    Parameters
    ----------
    statements : `list` [ `tuple` ]
        List of statements and their parameters, passed directly to
        ``execute_concurrent()``.
    execution_profile : `str`
        Execution profile name.
 
    Returns
    -------
    result
        Combined result of multiple statements, type of the result depends on
        specific row factory defined in execution profile. If row factory is
        one of `pandas_dataframe_factory` or `raw_data_factory` then pandas
        DataFrame is created from a combined result. Otherwise a list of
        rows is returned, type of each row is determined by the row factory.
 
    Notes
    -----
    This method can raise any exception that is raised by one of the provided
    statements.
    """
    session_wrap = SessionWrapper(session, execution_profile)
    results = execute_concurrent(
        session_wrap,
        statements,
        results_generator=True,
        raise_on_first_error=False,
        concurrency=concurrency,
    )
 
    ep = session.get_execution_profile(execution_profile)
    if ep.row_factory is raw_data_factory:
 
        # Collect rows into a single list and build Dataframe out of that
        _LOG.debug("making pandas data frame out of rows/columns")
        columns: Any = None
        rows = []
        for success, result in results:
            if success:
                result = result._current_rows
                if columns is None:
                    columns = result[0]
                elif columns != result[0]:
                    _LOG.error(
                        "different columns returned by queries: %s and %s",
                        columns,
                        result[0],
                    )
                    raise ValueError(
                        f"different columns returned by queries: {columns} and {result[0]}"
                    )
                rows += result[1]
            else:
                _LOG.error("error returned by query: %s", result)
                raise result
        catalog = pandas_dataframe_factory(columns, rows)
        _LOG.debug("pandas catalog shape: %s", catalog.shape)
        return catalog
 
    elif ep.row_factory is pandas_dataframe_factory:
 
        # Merge multiple DataFrames into one
        _LOG.debug("making pandas data frame out of set of data frames")
        dataframes = []
        for success, result in results:
            if success:
                dataframes.append(result._current_rows)
            else:
                _LOG.error("error returned by query: %s", result)
                raise result
        # concatenate all frames
        if len(dataframes) == 1:
            catalog = dataframes[0]
        else:
            catalog = pandas.concat(dataframes)
        _LOG.debug("pandas catalog shape: %s", catalog.shape)
        return catalog
 
    else:
 
        # Just concatenate all rows into a single collection.
        rows = []
        for success, result in results:
            if success:
                rows.extend(result)
            else:
                _LOG.error("error returned by query: %s", result)
                raise result
        _LOG.debug("number of rows: %s", len(rows))
        return rows
 
 

Variable Documentation

◆ CASSANDRA_IMPORTED

bool lsst.dax.apdb.cassandra_utils.CASSANDRA_IMPORTED = True

Definition at line 44 of file cassandra_utils.py.

Classes

Functions

Variables