Inheritance diagram for lsst.dax.apdb.apdbSql.ApdbSql:

Public Member Functions
def	__init__ (self, ApdbSqlConfig config)

Dict[str, int]	tableRowCount (self)

Optional[TableDef]	tableDef (self, ApdbTables table)

None	makeSchema (self, bool drop=False)

pandas.DataFrame	getDiaObjects (self, Region region)

Optional[pandas.DataFrame]	getDiaSources (self, Region region, Optional[Iterable[int]] object_ids, dafBase.DateTime visit_time)

Optional[pandas.DataFrame]	getDiaForcedSources (self, Region region, Optional[Iterable[int]] object_ids, dafBase.DateTime visit_time)

None	store (self, dafBase.DateTime visit_time, pandas.DataFrame objects, Optional[pandas.DataFrame] sources=None, Optional[pandas.DataFrame] forced_sources=None)

None	dailyJob (self)

int	countUnassociatedObjects (self)

ConfigurableField	makeField (cls, str doc)

Public Attributes
	config

	pixelator

Static Public Attributes
	ConfigClass = ApdbSqlConfig

Detailed Description

Implementation of APDB interface based on SQL database.

The implementation is configured via standard ``pex_config`` mechanism
using `ApdbSqlConfig` configuration class. For an example of different
configurations check ``config/`` folder.

Parameters
----------
config : `ApdbSqlConfig`
    Configuration object.

Definition at line 197 of file apdbSql.py.

Constructor & Destructor Documentation

◆ init()

def lsst.dax.apdb.apdbSql.ApdbSql.__init__	(		self,
		ApdbSqlConfig	config
	)

Definition at line 212 of file apdbSql.py.

     def __init__(self, config: ApdbSqlConfig):
  
         self.config = config
  
         _LOG.debug("APDB Configuration:")
         _LOG.debug("    dia_object_index: %s", self.config.dia_object_index)
         _LOG.debug("    read_sources_months: %s", self.config.read_sources_months)
         _LOG.debug("    read_forced_sources_months: %s", self.config.read_forced_sources_months)
         _LOG.debug("    dia_object_columns: %s", self.config.dia_object_columns)
         _LOG.debug("    object_last_replace: %s", self.config.object_last_replace)
         _LOG.debug("    schema_file: %s", self.config.schema_file)
         _LOG.debug("    extra_schema_file: %s", self.config.extra_schema_file)
         _LOG.debug("    schema prefix: %s", self.config.prefix)
  
         # engine is reused between multiple processes, make sure that we don't
         # share connections by disabling pool (by using NullPool class)
         kw = dict(echo=self.config.sql_echo)
         conn_args: Dict[str, Any] = dict()
         if not self.config.connection_pool:
             kw.update(poolclass=NullPool)
         if self.config.isolation_level is not None:
             kw.update(isolation_level=self.config.isolation_level)
         elif self.config.db_url.startswith("sqlite"):
             # Use READ_UNCOMMITTED as default value for sqlite.
             kw.update(isolation_level="READ_UNCOMMITTED")
         if self.config.connection_timeout is not None:
             if self.config.db_url.startswith("sqlite"):
                 conn_args.update(timeout=self.config.connection_timeout)
             elif self.config.db_url.startswith(("postgresql", "mysql")):
                 conn_args.update(connect_timeout=self.config.connection_timeout)
         kw.update(connect_args=conn_args)
         self._engine = sqlalchemy.create_engine(self.config.db_url, **kw)
  
         self._schema = ApdbSqlSchema(engine=self._engine,
                                      dia_object_index=self.config.dia_object_index,
                                      schema_file=self.config.schema_file,
                                      extra_schema_file=self.config.extra_schema_file,
                                      prefix=self.config.prefix,
                                      htm_index_column=self.config.htm_index_column)
  
         self.pixelator = HtmPixelization(self.config.htm_level)
  

Member Function Documentation

◆ countUnassociatedObjects()

int lsst.dax.apdb.apdbSql.ApdbSql.countUnassociatedObjects ( self )

Return the number of DiaObjects that have only one DiaSource
associated with them.

Used as part of ap_verify metrics.

Returns
-------
count : `int`
    Number of DiaObjects with exactly one associated DiaSource.

Notes
-----
This method can be very inefficient or slow in some implementations.

Reimplemented from lsst.dax.apdb.apdb.Apdb.

Definition at line 435 of file apdbSql.py.

     def countUnassociatedObjects(self) -> int:
         # docstring is inherited from a base class
  
         # Retrieve the DiaObject table.
         table: sqlalchemy.schema.Table = self._schema.objects
  
         # Construct the sql statement.
         stmt = sql.select([func.count()]).select_from(table).where(table.c.nDiaSources == 1)
         stmt = stmt.where(table.c.validityEnd == None)  # noqa: E711
  
         # Return the count.
         with self._engine.begin() as conn:
             count = conn.scalar(stmt)
  
         return count
  

◆ dailyJob()

None lsst.dax.apdb.apdbSql.ApdbSql.dailyJob ( self )

Implement daily activities like cleanup/vacuum.

What should be done during daily activities is determined by
specific implementation.

Reimplemented from lsst.dax.apdb.apdb.Apdb.

Definition at line 422 of file apdbSql.py.

     def dailyJob(self) -> None:
         # docstring is inherited from a base class
  
         if self._engine.name == 'postgresql':
  
             # do VACUUM on all tables
             _LOG.info("Running VACUUM on all tables")
             connection = self._engine.raw_connection()
             ISOLATION_LEVEL_AUTOCOMMIT = 0
             connection.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
             cursor = connection.cursor()
             cursor.execute("VACUUM ANALYSE")
  

◆ getDiaForcedSources()

Optional[pandas.DataFrame] lsst.dax.apdb.apdbSql.ApdbSql.getDiaForcedSources	(		self,
		Region	region,
		Optional[Iterable[int]]	object_ids,
		dafBase.DateTime	visit_time
	)

Return catalog of DiaForcedSource instances from a given region.

Parameters
----------
region : `lsst.sphgeom.Region`
    Region to search for DIASources.
object_ids : iterable [ `int` ], optional
    List of DiaObject IDs to further constrain the set of returned
    sources. If list is empty then empty catalog is returned with a
    correct schema.
visit_time : `lsst.daf.base.DateTime`
    Time of the current visit.

Returns
-------
catalog : `pandas.DataFrame`, or `None`
    Catalog containing DiaSource records. `None` is returned if
    ``read_sources_months`` configuration parameter is set to 0.

Raises
------
NotImplementedError
    Raised if ``object_ids`` is `None`.

Notes
-----
Even though base class allows `None` to be passed for ``object_ids``,
this class requires ``object_ids`` to be not-`None`.
`NotImplementedError` is raised if `None` is passed.

This method returns DiaForcedSource catalog for a region with additional
filtering based on DiaObject IDs. Only a subset of DiaSource history
is returned limited by ``read_forced_sources_months`` config parameter,
w.r.t. ``visit_time``. If ``object_ids`` is empty then an empty catalog
is always returned with a correct schema (columns/types).

Reimplemented from lsst.dax.apdb.apdb.Apdb.

Definition at line 343 of file apdbSql.py.

                             visit_time: dafBase.DateTime) -> Optional[pandas.DataFrame]:
         """Return catalog of DiaForcedSource instances from a given region.
  
         Parameters
         ----------
         region : `lsst.sphgeom.Region`
             Region to search for DIASources.
         object_ids : iterable [ `int` ], optional
             List of DiaObject IDs to further constrain the set of returned
             sources. If list is empty then empty catalog is returned with a
             correct schema.
         visit_time : `lsst.daf.base.DateTime`
             Time of the current visit.
  
         Returns
         -------
         catalog : `pandas.DataFrame`, or `None`
             Catalog containing DiaSource records. `None` is returned if
             ``read_sources_months`` configuration parameter is set to 0.
  
         Raises
         ------
         NotImplementedError
             Raised if ``object_ids`` is `None`.
  
         Notes
         -----
         Even though base class allows `None` to be passed for ``object_ids``,
         this class requires ``object_ids`` to be not-`None`.
         `NotImplementedError` is raised if `None` is passed.
  
         This method returns DiaForcedSource catalog for a region with additional
         filtering based on DiaObject IDs. Only a subset of DiaSource history
         is returned limited by ``read_forced_sources_months`` config parameter,
         w.r.t. ``visit_time``. If ``object_ids`` is empty then an empty catalog
         is always returned with a correct schema (columns/types).
         """
  
         if self.config.read_forced_sources_months == 0:
             _LOG.debug("Skip DiaForceSources fetching")
             return None
  
         if object_ids is None:
             # This implementation does not support region-based selection.
             raise NotImplementedError("Region-based selection is not supported")
  
         # TODO: DateTime.MJD must be consistent with code in ap_association,
         # alternatively we can fill midPointTai ourselves in store()
         midPointTai_start = _make_midPointTai_start(visit_time, self.config.read_forced_sources_months)
         _LOG.debug("midPointTai_start = %.6f", midPointTai_start)
  
         table: sqlalchemy.schema.Table = self._schema.forcedSources
         with Timer('DiaForcedSource select', self.config.timer):
             sources = self._getSourcesByIDs(table, list(object_ids), midPointTai_start)
  
         _LOG.debug("found %s DiaForcedSources", len(sources))
         return sources
  

◆ getDiaObjects()

pandas.DataFrame lsst.dax.apdb.apdbSql.ApdbSql.getDiaObjects	(		self,
		Region	region
	)

Returns catalog of DiaObject instances from a given region.

This method returns only the last version of each DiaObject. Some
records in a returned catalog may be outside the specified region, it
is up to a client to ignore those records or cleanup the catalog before
futher use.

Parameters
----------
region : `lsst.sphgeom.Region`
    Region to search for DIAObjects.

Returns
-------
catalog : `pandas.DataFrame`
    Catalog containing DiaObject records for a region that may be a
    superset of the specified region.

Reimplemented from lsst.dax.apdb.apdb.Apdb.

Definition at line 285 of file apdbSql.py.

     def getDiaObjects(self, region: Region) -> pandas.DataFrame:
         # docstring is inherited from a base class
  
         # decide what columns we need
         table: sqlalchemy.schema.Table
         if self.config.dia_object_index == 'last_object_table':
             table = self._schema.objects_last
         else:
             table = self._schema.objects
         if not self.config.dia_object_columns:
             query = table.select()
         else:
             columns = [table.c[col] for col in self.config.dia_object_columns]
             query = sql.select(columns)
  
         # build selection
         htm_index_column = table.columns[self.config.htm_index_column]
         exprlist = []
         pixel_ranges = self._htm_indices(region)
         for low, upper in pixel_ranges:
             upper -= 1
             if low == upper:
                 exprlist.append(htm_index_column == low)
             else:
                 exprlist.append(sql.expression.between(htm_index_column, low, upper))
         query = query.where(sql.expression.or_(*exprlist))
  
         # select latest version of objects
         if self.config.dia_object_index != 'last_object_table':
             query = query.where(table.c.validityEnd == None)  # noqa: E711
  
         _LOG.debug("query: %s", query)
  
         if self.config.explain:
             # run the same query with explain
             self._explain(query, self._engine)
  
         # execute select
         with Timer('DiaObject select', self.config.timer):
             with self._engine.begin() as conn:
                 objects = pandas.read_sql_query(query, conn)
         _LOG.debug("found %s DiaObjects", len(objects))
         return objects
  

◆ getDiaSources()

Optional[pandas.DataFrame] lsst.dax.apdb.apdbSql.ApdbSql.getDiaSources	(		self,
		Region	region,
		Optional[Iterable[int]]	object_ids,
		dafBase.DateTime	visit_time
	)

Return catalog of DiaSource instances from a given region.

Parameters
----------
region : `lsst.sphgeom.Region`
    Region to search for DIASources.
object_ids : iterable [ `int` ], optional
    List of DiaObject IDs to further constrain the set of returned
    sources. If `None` then returned sources are not constrained. If
    list is empty then empty catalog is returned with a correct
    schema.
visit_time : `lsst.daf.base.DateTime`
    Time of the current visit.

Returns
-------
catalog : `pandas.DataFrame`, or `None`
    Catalog containing DiaSource records. `None` is returned if
    ``read_sources_months`` configuration parameter is set to 0.

Notes
-----
This method returns DiaSource catalog for a region with additional
filtering based on DiaObject IDs. Only a subset of DiaSource history
is returned limited by ``read_sources_months`` config parameter, w.r.t.
``visit_time``. If ``object_ids`` is empty then an empty catalog is
always returned with the correct schema (columns/types). If
``object_ids`` is `None` then no filtering is performed and some of the
returned records may be outside the specified region.

Reimplemented from lsst.dax.apdb.apdb.Apdb.

Definition at line 329 of file apdbSql.py.

                       visit_time: dafBase.DateTime) -> Optional[pandas.DataFrame]:
         # docstring is inherited from a base class
         if self.config.read_sources_months == 0:
             _LOG.debug("Skip DiaSources fetching")
             return None
  
         if object_ids is None:
             # region-based select
             return self._getDiaSourcesInRegion(region, visit_time)
         else:
             return self._getDiaSourcesByIDs(list(object_ids), visit_time)
  

◆ makeField()

ConfigurableField lsst.dax.apdb.apdb.Apdb.makeField	(		cls,
		str	doc
	)

inherited

Make a `~lsst.pex.config.ConfigurableField` for Apdb.

Parameters
----------
doc : `str`
    Help text for the field.

Returns
-------
configurableField : `lsst.pex.config.ConfigurableField`
    A `~lsst.pex.config.ConfigurableField` for Apdb.

Definition at line 268 of file apdb.py.

     def makeField(cls, doc: str) -> ConfigurableField:
         """Make a `~lsst.pex.config.ConfigurableField` for Apdb.
  
         Parameters
         ----------
         doc : `str`
             Help text for the field.
  
         Returns
         -------
         configurableField : `lsst.pex.config.ConfigurableField`
             A `~lsst.pex.config.ConfigurableField` for Apdb.
         """
         return ConfigurableField(doc=doc, target=cls)

◆ makeSchema()

None lsst.dax.apdb.apdbSql.ApdbSql.makeSchema	(		self,
		bool	drop = `False`
	)

Create or re-create whole database schema.

Parameters
----------
drop : `bool`
    If True then drop all tables before creating new ones.

Reimplemented from lsst.dax.apdb.apdb.Apdb.

Definition at line 281 of file apdbSql.py.

     def makeSchema(self, drop: bool = False) -> None:
         # docstring is inherited from a base class
         self._schema.makeSchema(drop=drop)
  

◆ store()

None lsst.dax.apdb.apdbSql.ApdbSql.store	(		self,
		dafBase.DateTime	visit_time,
		pandas.DataFrame	objects,
		Optional[pandas.DataFrame]	sources = `None`,
		Optional[pandas.DataFrame]	forced_sources = `None`
	)

Store all three types of catalogs in the database.

Parameters
----------
visit_time : `lsst.daf.base.DateTime`
    Time of the visit.
objects : `pandas.DataFrame`
    Catalog with DiaObject records.
sources : `pandas.DataFrame`, optional
    Catalog with DiaSource records.
forced_sources : `pandas.DataFrame`, optional
    Catalog with DiaForcedSource records.

Notes
-----
This methods takes DataFrame catalogs, their schema must be
compatible with the schema of APDB table:

  - column names must correspond to database table columns
  - types and units of the columns must match database definitions,
    no unit conversion is performed presently
  - columns that have default values in database schema can be
    omitted from catalog
  - this method knows how to fill interval-related columns of DiaObject
    (validityStart, validityEnd) they do not need to appear in a
    catalog
  - source catalogs have ``diaObjectId`` column associating sources
    with objects

Reimplemented from lsst.dax.apdb.apdb.Apdb.

Definition at line 403 of file apdbSql.py.

               forced_sources: Optional[pandas.DataFrame] = None) -> None:
         # docstring is inherited from a base class
  
         # fill pixelId column for DiaObjects
         objects = self._add_obj_htm_index(objects)
         self._storeDiaObjects(objects, visit_time)
  
         if sources is not None:
             # copy pixelId column from DiaObjects to DiaSources
             sources = self._add_src_htm_index(sources, objects)
             self._storeDiaSources(sources)
  
         if forced_sources is not None:
             self._storeDiaForcedSources(forced_sources)
  

◆ tableDef()

Optional[TableDef] lsst.dax.apdb.apdbSql.ApdbSql.tableDef	(		self,
		ApdbTables	table
	)

Return table schema definition for a given table.

Parameters
----------
table : `ApdbTables`
    One of the known APDB tables.

Returns
-------
tableSchema : `TableDef` or `None`
    Table schema description, `None` is returned if table is not
    defined by this implementation.

Reimplemented from lsst.dax.apdb.apdb.Apdb.

Definition at line 277 of file apdbSql.py.

     def tableDef(self, table: ApdbTables) -> Optional[TableDef]:
         # docstring is inherited from a base class
         return self._schema.tableSchemas.get(table)
  

◆ tableRowCount()

Dict[str, int] lsst.dax.apdb.apdbSql.ApdbSql.tableRowCount ( self )

Returns dictionary with the table names and row counts.

Used by ``ap_proto`` to keep track of the size of the database tables.
Depending on database technology this could be expensive operation.

Returns
-------
row_counts : `dict`
    Dict where key is a table name and value is a row count.

Definition at line 254 of file apdbSql.py.

     def tableRowCount(self) -> Dict[str, int]:
         """Returns dictionary with the table names and row counts.
  
         Used by ``ap_proto`` to keep track of the size of the database tables.
         Depending on database technology this could be expensive operation.
  
         Returns
         -------
         row_counts : `dict`
             Dict where key is a table name and value is a row count.
         """
         res = {}
         tables: List[sqlalchemy.schema.Table] = [
             self._schema.objects, self._schema.sources, self._schema.forcedSources]
         if self.config.dia_object_index == 'last_object_table':
             tables.append(self._schema.objects_last)
         for table in tables:
             stmt = sql.select([func.count()]).select_from(table)
             count = self._engine.scalar(stmt)
             res[table.name] = count
  
         return res
  

Member Data Documentation

◆ config

lsst.dax.apdb.apdbSql.ApdbSql.config

Definition at line 214 of file apdbSql.py.

◆ ConfigClass

lsst.dax.apdb.apdbSql.ApdbSql.ConfigClass = ApdbSqlConfig

static

Definition at line 210 of file apdbSql.py.

◆ pixelator

lsst.dax.apdb.apdbSql.ApdbSql.pixelator

Definition at line 252 of file apdbSql.py.

The documentation for this class was generated from the following file:

/j/snowflake/release/lsstsw/stack/lsst-scipipe-0.7.0/Linux64/dax_apdb/22.0.1-5-g75bb458+99c117b92f/python/lsst/dax/apdb/apdbSql.py

Public Member Functions

Public Attributes

Static Public Attributes

Detailed Description

Constructor & Destructor Documentation

◆ __init__()

Member Function Documentation

◆ countUnassociatedObjects()

◆ dailyJob()

◆ getDiaForcedSources()

◆ getDiaObjects()

◆ getDiaSources()

◆ makeField()

◆ makeSchema()

◆ store()

◆ tableDef()

◆ tableRowCount()

Member Data Documentation

◆ config

◆ ConfigClass

◆ pixelator

◆ init()