Inheritance diagram for lsst.dax.apdb.apdbCassandraSchema.ApdbCassandraSchema:

Public Member Functions
def	__init__ (self, cassandra.cluster.Session session, str schema_file, Optional[str] extra_schema_file=None, str prefix="", str packing="none", bool time_partition_tables=False)

str	tableName (self, ApdbTables table_name)

Mapping[str, ColumnDef]	getColumnMap (self, ApdbTables table_name)

List[str]	partitionColumns (self, ApdbTables table_name)

List[str]	clusteringColumns (self, ApdbTables table_name)

None	makeSchema (self, bool drop=False, Optional[Tuple[int, int]] part_range=None)

List[ColumnDef]	packedColumns (self, ApdbTables table_name)

Public Attributes
	tableSchemas

Detailed Description

Class for management of APDB schema.

Parameters
----------
session : `cassandra.cluster.Session`
    Cassandra session object
schema_file : `str`
    Name of the YAML schema file.
extra_schema_file : `str`, optional
    Name of the YAML schema file with extra column definitions.
prefix : `str`, optional
    Prefix to add to all schema elements.
packing : `str`
    Type of packing to apply to columns, string "none" disable packing,
    any other value enables it.
time_partition_tables : `bool`
    If True then schema will have a separate table for each time partition.

Definition at line 39 of file apdbCassandraSchema.py.

Constructor & Destructor Documentation

◆ init()

def lsst.dax.apdb.apdbCassandraSchema.ApdbCassandraSchema.__init__	(		self,
		cassandra.cluster.Session	session,
		str	schema_file,
		Optional[str]	extra_schema_file = `None`,
		str	prefix = `""`,
		str	packing = `"none"`,
		bool	time_partition_tables = `False`
	)

Definition at line 71 of file apdbCassandraSchema.py.

                  packing: str = "none", time_partition_tables: bool = False):
  
         super().__init__(schema_file, extra_schema_file)
  
         self._session = session
         self._prefix = prefix
         self._packing = packing
  
         # add columns and index for partitioning.
         self._ignore_tables = []
         for table, tableDef in self.tableSchemas.items():
             columns = []
             if table is ApdbTables.DiaObjectLast:
                 # DiaObjectLast does not need temporal partitioning
                 columns = ["apdb_part"]
             elif table in (ApdbTables.DiaObject, ApdbTables.DiaSource, ApdbTables.DiaForcedSource):
                 # these three tables can use either pure spatial or combined
                 if time_partition_tables:
                     columns = ["apdb_part"]
                 else:
                     columns = ["apdb_part", "apdb_time_part"]
             else:
                 # TODO: Do not know yet how other tables can be partitioned
                 self._ignore_tables.append(table)
  
             # add columns to the column list
             columnDefs = [ColumnDef(name=name,
                                     type="BIGINT",
                                     nullable=False,
                                     default=None,
                                     description="",
                                     unit=None,
                                     ucd=None) for name in columns]
             tableDef.columns = columnDefs + tableDef.columns
  
             # make an index
             index = IndexDef(name=f"Part_{tableDef.name}", type=IndexType.PARTITION, columns=columns)
             tableDef.indices.append(index)
  
         self._packed_columns = {}
         if self._packing != "none":
             for table, tableDef in self.tableSchemas.items():
                 index_columns = set(itertools.chain.from_iterable(
                     index.columns for index in tableDef.indices
                 ))
                 columnsDefs = [column for column in tableDef.columns if column.name not in index_columns]
                 self._packed_columns[table] = columnsDefs
  

Member Function Documentation

◆ clusteringColumns()

List[str] lsst.dax.apdb.apdbCassandraSchema.ApdbCassandraSchema.clusteringColumns	(		self,
		ApdbTables	table_name
	)

Return a list of columns used for clustering.

Parameters
----------
table_name : `ApdbTables`
    Table name in APDB schema

Returns
-------
columns : `list` of `str`
    Names of columns for used for partitioning.

Definition at line 163 of file apdbCassandraSchema.py.

     def clusteringColumns(self, table_name: ApdbTables) -> List[str]:
         """Return a list of columns used for clustering.
  
         Parameters
         ----------
         table_name : `ApdbTables`
             Table name in APDB schema
  
         Returns
         -------
         columns : `list` of `str`
             Names of columns for used for partitioning.
         """
         table_schema = self.tableSchemas[table_name]
         for index in table_schema.indices:
             if index.type is IndexType.PRIMARY:
                 return index.columns
         return []
  

◆ getColumnMap()

Mapping[str, ColumnDef] lsst.dax.apdb.apdbCassandraSchema.ApdbCassandraSchema.getColumnMap	(		self,
		ApdbTables	table_name
	)

Returns mapping of column names to Column definitions.

Parameters
----------
table_name : `ApdbTables`
    One of known APDB table names.

Returns
-------
column_map : `dict`
    Mapping of column names to `ColumnDef` instances.

Definition at line 126 of file apdbCassandraSchema.py.

     def getColumnMap(self, table_name: ApdbTables) -> Mapping[str, ColumnDef]:
         """Returns mapping of column names to Column definitions.
  
         Parameters
         ----------
         table_name : `ApdbTables`
             One of known APDB table names.
  
         Returns
         -------
         column_map : `dict`
             Mapping of column names to `ColumnDef` instances.
         """
         table = self.tableSchemas[table_name]
         cmap = {column.name: column for column in table.columns}
         return cmap
  

◆ makeSchema()

None lsst.dax.apdb.apdbCassandraSchema.ApdbCassandraSchema.makeSchema	(		self,
		bool	drop = `False`,
		Optional[Tuple[int, int]]	part_range = `None`
	)

Create or re-create all tables.

Parameters
----------
drop : `bool`
    If True then drop tables before creating new ones.
part_range : `tuple` [ `int` ] or `None`
    Start and end partition number for time partitions, end is not
    inclusive. Used to create per-partition DiaObject, DiaSource, and
    DiaForcedSource tables. If `None` then per-partition tables are
    not created.

Definition at line 182 of file apdbCassandraSchema.py.

     def makeSchema(self, drop: bool = False, part_range: Optional[Tuple[int, int]] = None) -> None:
         """Create or re-create all tables.
  
         Parameters
         ----------
         drop : `bool`
             If True then drop tables before creating new ones.
         part_range : `tuple` [ `int` ] or `None`
             Start and end partition number for time partitions, end is not
             inclusive. Used to create per-partition DiaObject, DiaSource, and
             DiaForcedSource tables. If `None` then per-partition tables are
             not created.
         """
  
         for table in self.tableSchemas:
             if table in self._ignore_tables:
                 _LOG.debug("Skipping schema for table %s", table)
                 continue
             _LOG.debug("Making table %s", table)
  
             fullTable = table.table_name(self._prefix)
  
             table_list = [fullTable]
             if part_range is not None:
                 if table in (ApdbTables.DiaSource, ApdbTables.DiaForcedSource, ApdbTables.DiaObject):
                     partitions = range(*part_range)
                     table_list = [f"{fullTable}_{part}" for part in partitions]
  
             if drop:
                 queries = [f'DROP TABLE IF EXISTS "{table_name}"' for table_name in table_list]
                 futures = [self._session.execute_async(query, timeout=None) for query in queries]
                 for future in futures:
                     _LOG.debug("wait for query: %s", future.query)
                     future.result()
                     _LOG.debug("query finished: %s", future.query)
  
             queries = []
             for table_name in table_list:
                 if_not_exists = "" if drop else "IF NOT EXISTS"
                 columns = ", ".join(self._tableColumns(table))
                 query = f'CREATE TABLE {if_not_exists} "{table_name}" ({columns})'
                 _LOG.debug("query: %s", query)
                 queries.append(query)
             futures = [self._session.execute_async(query, timeout=None) for query in queries]
             for future in futures:
                 _LOG.debug("wait for query: %s", future.query)
                 future.result()
                 _LOG.debug("query finished: %s", future.query)
  

◆ packedColumns()

List[ColumnDef] lsst.dax.apdb.apdbCassandraSchema.ApdbCassandraSchema.packedColumns	(		self,
		ApdbTables	table_name
	)

Return set of columns that are packed into BLOB.

Parameters
----------
table_name : `ApdbTables`
    Name of the table.

Returns
-------
columns : `list` [ `ColumnDef` ]
    List of column definitions. Empty list is returned if packing is
    not configured.

Definition at line 288 of file apdbCassandraSchema.py.

     def packedColumns(self, table_name: ApdbTables) -> List[ColumnDef]:
         """Return set of columns that are packed into BLOB.
  
         Parameters
         ----------
         table_name : `ApdbTables`
             Name of the table.
  
         Returns
         -------
         columns : `list` [ `ColumnDef` ]
             List of column definitions. Empty list is returned if packing is
             not configured.
         """
         return self._packed_columns.get(table_name, [])

◆ partitionColumns()

List[str] lsst.dax.apdb.apdbCassandraSchema.ApdbCassandraSchema.partitionColumns	(		self,
		ApdbTables	table_name
	)

Return a list of columns used for table partitioning.

Parameters
----------
table_name : `ApdbTables`
    Table name in APDB schema

Returns
-------
columns : `list` of `str`
    Names of columns for used for partitioning.

Definition at line 143 of file apdbCassandraSchema.py.

     def partitionColumns(self, table_name: ApdbTables) -> List[str]:
         """Return a list of columns used for table partitioning.
  
         Parameters
         ----------
         table_name : `ApdbTables`
             Table name in APDB schema
  
         Returns
         -------
         columns : `list` of `str`
             Names of columns for used for partitioning.
         """
         table_schema = self.tableSchemas[table_name]
         for index in table_schema.indices:
             if index.type is IndexType.PARTITION:
                 # there could be just one partitoning index (possibly with few columns)
                 return index.columns
         return []
  

◆ tableName()

str lsst.dax.apdb.apdbCassandraSchema.ApdbCassandraSchema.tableName	(		self,
		ApdbTables	table_name
	)

Return Cassandra table name for APDB table.

Definition at line 121 of file apdbCassandraSchema.py.

     def tableName(self, table_name: ApdbTables) -> str:
         """Return Cassandra table name for APDB table.
         """
         return table_name.table_name(self._prefix)
  

Member Data Documentation

◆ tableSchemas

lsst.dax.apdb.apdbSchema.ApdbSchema.tableSchemas

inherited

Definition at line 178 of file apdbSchema.py.

The documentation for this class was generated from the following file:

/j/snowflake/release/lsstsw/stack/lsst-scipipe-0.7.0/Linux64/dax_apdb/22.0.1-5-g75bb458+99c117b92f/python/lsst/dax/apdb/apdbCassandraSchema.py

Public Member Functions

Public Attributes