Inheritance diagram for lsst.pipe.tasks.parquetTable.ParquetTable:

Public Member Functions
def	__init__ (self, filename=None, dataFrame=None)

def	write (self, filename)

def	pandasMd (self)

def	columnIndex (self)

def	columns (self)

def	toDataFrame (self, columns=None)

Public Attributes
	filename

Detailed Description

Thin wrapper to pyarrow's ParquetFile object

Call `toDataFrame` method to get a `pandas.DataFrame` object,
optionally passing specific columns.

The main purpose of having this wrapper rather than directly
using `pyarrow.ParquetFile` is to make it nicer to load
selected subsets of columns, especially from dataframes with multi-level
column indices.

Instantiated with either a path to a parquet file or a dataFrame

Parameters
----------
filename : str, optional
    Path to Parquet file.
dataFrame : dataFrame, optional

Definition at line 34 of file parquetTable.py.

Constructor & Destructor Documentation

◆ init()

def lsst.pipe.tasks.parquetTable.ParquetTable.__init__	(	self,
		filename = `None`,
		dataFrame = `None`
	)

Definition at line 54 of file parquetTable.py.

     def __init__(self, filename=None, dataFrame=None):
         self.filename = filename
         if filename is not None:
             self._pf = pyarrow.parquet.ParquetFile(filename)
             self._df = None
             self._pandasMd = None
         elif dataFrame is not None:
             self._df = dataFrame
             self._pf = None
         else:
             raise ValueError("Either filename or dataFrame must be passed.")
  
         self._columns = None
         self._columnIndex = None
  

Member Function Documentation

◆ columnIndex()

def lsst.pipe.tasks.parquetTable.ParquetTable.columnIndex ( self )

Columns as a pandas Index

Definition at line 91 of file parquetTable.py.

     def columnIndex(self):
         """Columns as a pandas Index
         """
         if self._columnIndex is None:
             self._columnIndex = self._getColumnIndex()
         return self._columnIndex
  

◆ columns()

def lsst.pipe.tasks.parquetTable.ParquetTable.columns ( self )

List of column names (or column index if df is set)

This may either be a list of column names, or a
pandas.Index object describing the column index, depending
on whether the ParquetTable object is wrapping a ParquetFile
or a DataFrame.

Definition at line 105 of file parquetTable.py.

     def columns(self):
         """List of column names (or column index if df is set)
  
         This may either be a list of column names, or a
         pandas.Index object describing the column index, depending
         on whether the ParquetTable object is wrapping a ParquetFile
         or a DataFrame.
         """
         if self._columns is None:
             self._columns = self._getColumns()
         return self._columns
  

◆ pandasMd()

def lsst.pipe.tasks.parquetTable.ParquetTable.pandasMd ( self )

Definition at line 83 of file parquetTable.py.

     def pandasMd(self):
         if self._pf is None:
             raise AttributeError("This property is only accessible if ._pf is set.")
         if self._pandasMd is None:
             self._pandasMd = json.loads(self._pf.metadata.metadata[b"pandas"])
         return self._pandasMd
  

◆ toDataFrame()

def lsst.pipe.tasks.parquetTable.ParquetTable.toDataFrame	(	self,
		columns = `None`
	)

Get table (or specified columns) as a pandas DataFrame

Parameters
----------
columns : list, optional
    Desired columns.  If `None`, then all columns will be
    returned.

Definition at line 126 of file parquetTable.py.

     def toDataFrame(self, columns=None):
         """Get table (or specified columns) as a pandas DataFrame
  
         Parameters
         ----------
         columns : list, optional
             Desired columns.  If `None`, then all columns will be
             returned.
         """
         if self._pf is None:
             if columns is None:
                 return self._df
             else:
                 return self._df[columns]
  
         if columns is None:
             return self._pf.read().to_pandas()
  
         df = self._pf.read(columns=columns, use_pandas_metadata=True).to_pandas()
         return df
  
  

◆ write()

def lsst.pipe.tasks.parquetTable.ParquetTable.write	(	self,
		filename
	)

Write pandas dataframe to parquet

Parameters
----------
filename : str
    Path to which to write.

Definition at line 69 of file parquetTable.py.

     def write(self, filename):
         """Write pandas dataframe to parquet
  
         Parameters
         ----------
         filename : str
             Path to which to write.
         """
         if self._df is None:
             raise ValueError("df property must be defined to write.")
         table = pyarrow.Table.from_pandas(self._df)
         pyarrow.parquet.write_table(table, filename)
  

Member Data Documentation

◆ filename

lsst.pipe.tasks.parquetTable.ParquetTable.filename

Definition at line 55 of file parquetTable.py.

The documentation for this class was generated from the following file:

/j/snowflake/release/lsstsw/stack/lsst-scipipe-0.7.0/Linux64/pipe_tasks/21.0.0-172-gfb10e10a+18fedfabac/python/lsst/pipe/tasks/parquetTable.py

Public Member Functions

Public Attributes