LSST Applications  21.0.0-172-gfb10e10a+18fedfabac,22.0.0+297cba6710,22.0.0+80564b0ff1,22.0.0+8d77f4f51a,22.0.0+a28f4c53b1,22.0.0+dcf3732eb2,22.0.1-1-g7d6de66+2a20fdde0d,22.0.1-1-g8e32f31+297cba6710,22.0.1-1-geca5380+7fa3b7d9b6,22.0.1-12-g44dc1dc+2a20fdde0d,22.0.1-15-g6a90155+515f58c32b,22.0.1-16-g9282f48+790f5f2caa,22.0.1-2-g92698f7+dcf3732eb2,22.0.1-2-ga9b0f51+7fa3b7d9b6,22.0.1-2-gd1925c9+bf4f0e694f,22.0.1-24-g1ad7a390+a9625a72a8,22.0.1-25-g5bf6245+3ad8ecd50b,22.0.1-25-gb120d7b+8b5510f75f,22.0.1-27-g97737f7+2a20fdde0d,22.0.1-32-gf62ce7b1+aa4237961e,22.0.1-4-g0b3f228+2a20fdde0d,22.0.1-4-g243d05b+871c1b8305,22.0.1-4-g3a563be+32dcf1063f,22.0.1-4-g44f2e3d+9e4ab0f4fa,22.0.1-42-gca6935d93+ba5e5ca3eb,22.0.1-5-g15c806e+85460ae5f3,22.0.1-5-g58711c4+611d128589,22.0.1-5-g75bb458+99c117b92f,22.0.1-6-g1c63a23+7fa3b7d9b6,22.0.1-6-g50866e6+84ff5a128b,22.0.1-6-g8d3140d+720564cf76,22.0.1-6-gd805d02+cc5644f571,22.0.1-8-ge5750ce+85460ae5f3,master-g6e05de7fdc+babf819c66,master-g99da0e417a+8d77f4f51a,w.2021.48
LSST Data Management Base Package
Public Member Functions | Public Attributes | List of all members
lsst.pipe.tasks.parquetTable.ParquetTable Class Reference
Inheritance diagram for lsst.pipe.tasks.parquetTable.ParquetTable:
lsst.pipe.tasks.parquetTable.MultilevelParquetTable

Public Member Functions

def __init__ (self, filename=None, dataFrame=None)
 
def write (self, filename)
 
def pandasMd (self)
 
def columnIndex (self)
 
def columns (self)
 
def toDataFrame (self, columns=None)
 

Public Attributes

 filename
 

Detailed Description

Thin wrapper to pyarrow's ParquetFile object

Call `toDataFrame` method to get a `pandas.DataFrame` object,
optionally passing specific columns.

The main purpose of having this wrapper rather than directly
using `pyarrow.ParquetFile` is to make it nicer to load
selected subsets of columns, especially from dataframes with multi-level
column indices.

Instantiated with either a path to a parquet file or a dataFrame

Parameters
----------
filename : str, optional
    Path to Parquet file.
dataFrame : dataFrame, optional

Definition at line 34 of file parquetTable.py.

Constructor & Destructor Documentation

◆ __init__()

def lsst.pipe.tasks.parquetTable.ParquetTable.__init__ (   self,
  filename = None,
  dataFrame = None 
)

Definition at line 54 of file parquetTable.py.

54  def __init__(self, filename=None, dataFrame=None):
55  self.filename = filename
56  if filename is not None:
57  self._pf = pyarrow.parquet.ParquetFile(filename)
58  self._df = None
59  self._pandasMd = None
60  elif dataFrame is not None:
61  self._df = dataFrame
62  self._pf = None
63  else:
64  raise ValueError("Either filename or dataFrame must be passed.")
65 
66  self._columns = None
67  self._columnIndex = None
68 

Member Function Documentation

◆ columnIndex()

def lsst.pipe.tasks.parquetTable.ParquetTable.columnIndex (   self)
Columns as a pandas Index

Definition at line 91 of file parquetTable.py.

91  def columnIndex(self):
92  """Columns as a pandas Index
93  """
94  if self._columnIndex is None:
95  self._columnIndex = self._getColumnIndex()
96  return self._columnIndex
97 

◆ columns()

def lsst.pipe.tasks.parquetTable.ParquetTable.columns (   self)
List of column names (or column index if df is set)

This may either be a list of column names, or a
pandas.Index object describing the column index, depending
on whether the ParquetTable object is wrapping a ParquetFile
or a DataFrame.

Definition at line 105 of file parquetTable.py.

105  def columns(self):
106  """List of column names (or column index if df is set)
107 
108  This may either be a list of column names, or a
109  pandas.Index object describing the column index, depending
110  on whether the ParquetTable object is wrapping a ParquetFile
111  or a DataFrame.
112  """
113  if self._columns is None:
114  self._columns = self._getColumns()
115  return self._columns
116 

◆ pandasMd()

def lsst.pipe.tasks.parquetTable.ParquetTable.pandasMd (   self)

Definition at line 83 of file parquetTable.py.

83  def pandasMd(self):
84  if self._pf is None:
85  raise AttributeError("This property is only accessible if ._pf is set.")
86  if self._pandasMd is None:
87  self._pandasMd = json.loads(self._pf.metadata.metadata[b"pandas"])
88  return self._pandasMd
89 

◆ toDataFrame()

def lsst.pipe.tasks.parquetTable.ParquetTable.toDataFrame (   self,
  columns = None 
)
Get table (or specified columns) as a pandas DataFrame

Parameters
----------
columns : list, optional
    Desired columns.  If `None`, then all columns will be
    returned.

Definition at line 126 of file parquetTable.py.

126  def toDataFrame(self, columns=None):
127  """Get table (or specified columns) as a pandas DataFrame
128 
129  Parameters
130  ----------
131  columns : list, optional
132  Desired columns. If `None`, then all columns will be
133  returned.
134  """
135  if self._pf is None:
136  if columns is None:
137  return self._df
138  else:
139  return self._df[columns]
140 
141  if columns is None:
142  return self._pf.read().to_pandas()
143 
144  df = self._pf.read(columns=columns, use_pandas_metadata=True).to_pandas()
145  return df
146 
147 
std::shared_ptr< table::io::Persistable > read(table::io::InputArchive const &archive, table::io::CatalogVector const &catalogs) const override
Definition: warpExposure.cc:0

◆ write()

def lsst.pipe.tasks.parquetTable.ParquetTable.write (   self,
  filename 
)
Write pandas dataframe to parquet

Parameters
----------
filename : str
    Path to which to write.

Definition at line 69 of file parquetTable.py.

69  def write(self, filename):
70  """Write pandas dataframe to parquet
71 
72  Parameters
73  ----------
74  filename : str
75  Path to which to write.
76  """
77  if self._df is None:
78  raise ValueError("df property must be defined to write.")
79  table = pyarrow.Table.from_pandas(self._df)
80  pyarrow.parquet.write_table(table, filename)
81 
void write(OutputArchiveHandle &handle) const override

Member Data Documentation

◆ filename

lsst.pipe.tasks.parquetTable.ParquetTable.filename

Definition at line 55 of file parquetTable.py.


The documentation for this class was generated from the following file: