Inheritance diagram for lsst.pipe.tasks.parquetTable.ParquetTable:

Public Member Functions
	__init__ (self, filename=None, dataFrame=None)

	write (self, filename)

	pandasMd (self)

	columnIndex (self)

	columns (self)

	toDataFrame (self, columns=None)

Public Attributes
	filename

	columns

Protected Member Functions
	_getColumnIndex (self)

	_getColumns (self)

	_sanitizeColumns (self, columns)

Protected Attributes
	_pf

	_df

	_pandasMd

	_columns

	_columnIndex

Detailed Description

Thin wrapper to pyarrow's ParquetFile object

Call `toDataFrame` method to get a `pandas.DataFrame` object,
optionally passing specific columns.

The main purpose of having this wrapper rather than directly
using `pyarrow.ParquetFile` is to make it nicer to load
selected subsets of columns, especially from dataframes with multi-level
column indices.

Instantiated with either a path to a parquet file or a dataFrame

Parameters
----------
filename : str, optional
    Path to Parquet file.
dataFrame : dataFrame, optional

Definition at line 41 of file parquetTable.py.

Constructor & Destructor Documentation

◆ init()

lsst.pipe.tasks.parquetTable.ParquetTable.__init__	(	self,
		filename = `None`,
		dataFrame = `None`
	)

Reimplemented in lsst.pipe.tasks.parquetTable.MultilevelParquetTable.

Definition at line 61 of file parquetTable.py.

    def __init__(self, filename=None, dataFrame=None):
        self.filename = filename
        if filename is not None:
            self._pf = pyarrow.parquet.ParquetFile(filename)
            self._df = None
            self._pandasMd = None
        elif dataFrame is not None:
            self._df = dataFrame
            self._pf = None
        else:
            raise ValueError("Either filename or dataFrame must be passed.")
 
        self._columns = None
        self._columnIndex = None
 

Member Function Documentation

◆ _getColumnIndex()

lsst.pipe.tasks.parquetTable.ParquetTable._getColumnIndex ( self )

protected

Reimplemented in lsst.pipe.tasks.parquetTable.MultilevelParquetTable.

Definition at line 105 of file parquetTable.py.

    def _getColumnIndex(self):
        if self._df is not None:
            return self._df.columns
        else:
            return pd.Index(self.columns)
 

◆ _getColumns()

lsst.pipe.tasks.parquetTable.ParquetTable._getColumns ( self )

protected

Reimplemented in lsst.pipe.tasks.parquetTable.MultilevelParquetTable.

Definition at line 124 of file parquetTable.py.

    def _getColumns(self):
        if self._df is not None:
            return self._sanitizeColumns(self._df.columns)
        else:
            return self._pf.metadata.schema.names
 

◆ _sanitizeColumns()

lsst.pipe.tasks.parquetTable.ParquetTable._sanitizeColumns	(	self,
		columns
	)

protected

Definition at line 130 of file parquetTable.py.

    def _sanitizeColumns(self, columns):
        return [c for c in columns if c in self.columnIndex]
 

◆ columnIndex()

lsst.pipe.tasks.parquetTable.ParquetTable.columnIndex ( self )

Columns as a pandas Index

Definition at line 98 of file parquetTable.py.

    def columnIndex(self):
        """Columns as a pandas Index
        """
        if self._columnIndex is None:
            self._columnIndex = self._getColumnIndex()
        return self._columnIndex
 

◆ columns()

lsst.pipe.tasks.parquetTable.ParquetTable.columns ( self )

List of column names (or column index if df is set)

This may either be a list of column names, or a
pandas.Index object describing the column index, depending
on whether the ParquetTable object is wrapping a ParquetFile
or a DataFrame.

Definition at line 112 of file parquetTable.py.

    def columns(self):
        """List of column names (or column index if df is set)
 
        This may either be a list of column names, or a
        pandas.Index object describing the column index, depending
        on whether the ParquetTable object is wrapping a ParquetFile
        or a DataFrame.
        """
        if self._columns is None:
            self._columns = self._getColumns()
        return self._columns
 

◆ pandasMd()

lsst.pipe.tasks.parquetTable.ParquetTable.pandasMd ( self )

Definition at line 90 of file parquetTable.py.

    def pandasMd(self):
        if self._pf is None:
            raise AttributeError("This property is only accessible if ._pf is set.")
        if self._pandasMd is None:
            self._pandasMd = json.loads(self._pf.metadata.metadata[b"pandas"])
        return self._pandasMd
 

◆ toDataFrame()

lsst.pipe.tasks.parquetTable.ParquetTable.toDataFrame	(	self,
		columns = `None`
	)

Get table (or specified columns) as a pandas DataFrame

Parameters
----------
columns : list, optional
    Desired columns.  If `None`, then all columns will be
    returned.

Reimplemented in lsst.pipe.tasks.parquetTable.MultilevelParquetTable.

Definition at line 133 of file parquetTable.py.

    def toDataFrame(self, columns=None):
        """Get table (or specified columns) as a pandas DataFrame
 
        Parameters
        ----------
        columns : list, optional
            Desired columns.  If `None`, then all columns will be
            returned.
        """
        if self._pf is None:
            if columns is None:
                return self._df
            else:
                return self._df[columns]
 
        if columns is None:
            return self._pf.read().to_pandas()
 
        df = self._pf.read(columns=columns, use_pandas_metadata=True).to_pandas()
        return df
 
 
@deprecated(reason="The MultilevelParquetTable interface is from Gen2 i/o and will be removed after v26.",
            version="v25", category=FutureWarning)

◆ write()

lsst.pipe.tasks.parquetTable.ParquetTable.write	(	self,
		filename
	)

Write pandas dataframe to parquet

Parameters
----------
filename : str
    Path to which to write.

Definition at line 76 of file parquetTable.py.

    def write(self, filename):
        """Write pandas dataframe to parquet
 
        Parameters
        ----------
        filename : str
            Path to which to write.
        """
        if self._df is None:
            raise ValueError("df property must be defined to write.")
        table = pyarrow.Table.from_pandas(self._df)
        pyarrow.parquet.write_table(table, filename)
 

Member Data Documentation

◆ _columnIndex

lsst.pipe.tasks.parquetTable.ParquetTable._columnIndex

protected

Definition at line 74 of file parquetTable.py.

◆ _columns

lsst.pipe.tasks.parquetTable.ParquetTable._columns

protected

Definition at line 73 of file parquetTable.py.

◆ _df

lsst.pipe.tasks.parquetTable.ParquetTable._df

protected

Definition at line 65 of file parquetTable.py.

◆ _pandasMd

lsst.pipe.tasks.parquetTable.ParquetTable._pandasMd

protected

Definition at line 66 of file parquetTable.py.

◆ _pf

lsst.pipe.tasks.parquetTable.ParquetTable._pf

protected

Definition at line 64 of file parquetTable.py.

◆ columns

lsst.pipe.tasks.parquetTable.ParquetTable.columns

Definition at line 109 of file parquetTable.py.

◆ filename

lsst.pipe.tasks.parquetTable.ParquetTable.filename

Definition at line 62 of file parquetTable.py.

The documentation for this class was generated from the following file:

/j/snowflake/release/lsstsw/stack/lsst-scipipe-8.0.0/Linux64/pipe_tasks/g8a2af25fa3+33d8adeb5f/python/lsst/pipe/tasks/parquetTable.py

Public Member Functions

Public Attributes

Protected Member Functions

Protected Attributes

Detailed Description

Constructor & Destructor Documentation

◆ __init__()

Member Function Documentation

◆ _getColumnIndex()

◆ _getColumns()

◆ _sanitizeColumns()

◆ columnIndex()

◆ columns()

◆ pandasMd()

◆ toDataFrame()

◆ write()

Member Data Documentation

◆ _columnIndex

◆ _columns

◆ _df

◆ _pandasMd

◆ _pf

◆ columns

◆ filename

◆ init()