LSST Applications g011c388f00+1570843bc3,g0265f82a02+1c31ac625c,g08a116f7bc+9b5a8b4fc4,g113e161629+00d7be2b8d,g16a3bce237+1c31ac625c,g2079a07aa2+6a65a43b64,g2bbee38e9b+1c31ac625c,g337abbeb29+1c31ac625c,g3ddfee87b4+15db52e637,g50ff169b8f+f00c948d2c,g52b1c1532d+81bc2a20b4,g858d7b2824+00d7be2b8d,g88964a4962+8a1a53efdf,g8a2af25fa3+33d8adeb5f,g8a8a8dda67+81bc2a20b4,g99855d9996+dcceda0d40,g9ddcbc5298+fe33e4d80d,ga1e77700b3+ec8c1568a5,ga8c6da7877+b6c80215ae,gae46bcf261+1c31ac625c,gb700894bec+3b32ebc22c,gb8350603e9+de515223a7,gba4ed39666+d9abe90c32,gbeb006f7da+e2f003f3e5,gc86a011abf+00d7be2b8d,gcf0d15dbbd+15db52e637,gd162630629+0ff1f5d43c,gdaeeff99f8+6ceac51f81,ge79ae78c31+1c31ac625c,gee10cc3b42+81bc2a20b4,gf041782ebf+0cc2057818,gf11f55472b+99ee6e9747,gf1cff7945b+00d7be2b8d,gf748b16de2+9283e76039,gf9db590de0+15db52e637,v26.0.1.rc2
LSST Data Management Base Package
Loading...
Searching...
No Matches
Public Member Functions | Public Attributes | Protected Member Functions | Protected Attributes | List of all members
lsst.pipe.tasks.parquetTable.ParquetTable Class Reference
Inheritance diagram for lsst.pipe.tasks.parquetTable.ParquetTable:
lsst.pipe.tasks.parquetTable.MultilevelParquetTable

Public Member Functions

 __init__ (self, filename=None, dataFrame=None)
 
 write (self, filename)
 
 pandasMd (self)
 
 columnIndex (self)
 
 columns (self)
 
 toDataFrame (self, columns=None)
 

Public Attributes

 filename
 
 columns
 

Protected Member Functions

 _getColumnIndex (self)
 
 _getColumns (self)
 
 _sanitizeColumns (self, columns)
 

Protected Attributes

 _pf
 
 _df
 
 _pandasMd
 
 _columns
 
 _columnIndex
 

Detailed Description

Thin wrapper to pyarrow's ParquetFile object

Call `toDataFrame` method to get a `pandas.DataFrame` object,
optionally passing specific columns.

The main purpose of having this wrapper rather than directly
using `pyarrow.ParquetFile` is to make it nicer to load
selected subsets of columns, especially from dataframes with multi-level
column indices.

Instantiated with either a path to a parquet file or a dataFrame

Parameters
----------
filename : str, optional
    Path to Parquet file.
dataFrame : dataFrame, optional

Definition at line 41 of file parquetTable.py.

Constructor & Destructor Documentation

◆ __init__()

lsst.pipe.tasks.parquetTable.ParquetTable.__init__ (   self,
  filename = None,
  dataFrame = None 
)

Reimplemented in lsst.pipe.tasks.parquetTable.MultilevelParquetTable.

Definition at line 61 of file parquetTable.py.

61 def __init__(self, filename=None, dataFrame=None):
62 self.filename = filename
63 if filename is not None:
64 self._pf = pyarrow.parquet.ParquetFile(filename)
65 self._df = None
66 self._pandasMd = None
67 elif dataFrame is not None:
68 self._df = dataFrame
69 self._pf = None
70 else:
71 raise ValueError("Either filename or dataFrame must be passed.")
72
73 self._columns = None
74 self._columnIndex = None
75

Member Function Documentation

◆ _getColumnIndex()

lsst.pipe.tasks.parquetTable.ParquetTable._getColumnIndex (   self)
protected

Reimplemented in lsst.pipe.tasks.parquetTable.MultilevelParquetTable.

Definition at line 105 of file parquetTable.py.

105 def _getColumnIndex(self):
106 if self._df is not None:
107 return self._df.columns
108 else:
109 return pd.Index(self.columns)
110

◆ _getColumns()

lsst.pipe.tasks.parquetTable.ParquetTable._getColumns (   self)
protected

Reimplemented in lsst.pipe.tasks.parquetTable.MultilevelParquetTable.

Definition at line 124 of file parquetTable.py.

124 def _getColumns(self):
125 if self._df is not None:
126 return self._sanitizeColumns(self._df.columns)
127 else:
128 return self._pf.metadata.schema.names
129

◆ _sanitizeColumns()

lsst.pipe.tasks.parquetTable.ParquetTable._sanitizeColumns (   self,
  columns 
)
protected

Definition at line 130 of file parquetTable.py.

130 def _sanitizeColumns(self, columns):
131 return [c for c in columns if c in self.columnIndex]
132

◆ columnIndex()

lsst.pipe.tasks.parquetTable.ParquetTable.columnIndex (   self)
Columns as a pandas Index

Definition at line 98 of file parquetTable.py.

98 def columnIndex(self):
99 """Columns as a pandas Index
100 """
101 if self._columnIndex is None:
102 self._columnIndex = self._getColumnIndex()
103 return self._columnIndex
104

◆ columns()

lsst.pipe.tasks.parquetTable.ParquetTable.columns (   self)
List of column names (or column index if df is set)

This may either be a list of column names, or a
pandas.Index object describing the column index, depending
on whether the ParquetTable object is wrapping a ParquetFile
or a DataFrame.

Definition at line 112 of file parquetTable.py.

112 def columns(self):
113 """List of column names (or column index if df is set)
114
115 This may either be a list of column names, or a
116 pandas.Index object describing the column index, depending
117 on whether the ParquetTable object is wrapping a ParquetFile
118 or a DataFrame.
119 """
120 if self._columns is None:
121 self._columns = self._getColumns()
122 return self._columns
123

◆ pandasMd()

lsst.pipe.tasks.parquetTable.ParquetTable.pandasMd (   self)

Definition at line 90 of file parquetTable.py.

90 def pandasMd(self):
91 if self._pf is None:
92 raise AttributeError("This property is only accessible if ._pf is set.")
93 if self._pandasMd is None:
94 self._pandasMd = json.loads(self._pf.metadata.metadata[b"pandas"])
95 return self._pandasMd
96

◆ toDataFrame()

lsst.pipe.tasks.parquetTable.ParquetTable.toDataFrame (   self,
  columns = None 
)
Get table (or specified columns) as a pandas DataFrame

Parameters
----------
columns : list, optional
    Desired columns.  If `None`, then all columns will be
    returned.

Reimplemented in lsst.pipe.tasks.parquetTable.MultilevelParquetTable.

Definition at line 133 of file parquetTable.py.

133 def toDataFrame(self, columns=None):
134 """Get table (or specified columns) as a pandas DataFrame
135
136 Parameters
137 ----------
138 columns : list, optional
139 Desired columns. If `None`, then all columns will be
140 returned.
141 """
142 if self._pf is None:
143 if columns is None:
144 return self._df
145 else:
146 return self._df[columns]
147
148 if columns is None:
149 return self._pf.read().to_pandas()
150
151 df = self._pf.read(columns=columns, use_pandas_metadata=True).to_pandas()
152 return df
153
154
155@deprecated(reason="The MultilevelParquetTable interface is from Gen2 i/o and will be removed after v26.",
156 version="v25", category=FutureWarning)
std::shared_ptr< table::io::Persistable > read(table::io::InputArchive const &archive, table::io::CatalogVector const &catalogs) const override

◆ write()

lsst.pipe.tasks.parquetTable.ParquetTable.write (   self,
  filename 
)
Write pandas dataframe to parquet

Parameters
----------
filename : str
    Path to which to write.

Definition at line 76 of file parquetTable.py.

76 def write(self, filename):
77 """Write pandas dataframe to parquet
78
79 Parameters
80 ----------
81 filename : str
82 Path to which to write.
83 """
84 if self._df is None:
85 raise ValueError("df property must be defined to write.")
86 table = pyarrow.Table.from_pandas(self._df)
87 pyarrow.parquet.write_table(table, filename)
88

Member Data Documentation

◆ _columnIndex

lsst.pipe.tasks.parquetTable.ParquetTable._columnIndex
protected

Definition at line 74 of file parquetTable.py.

◆ _columns

lsst.pipe.tasks.parquetTable.ParquetTable._columns
protected

Definition at line 73 of file parquetTable.py.

◆ _df

lsst.pipe.tasks.parquetTable.ParquetTable._df
protected

Definition at line 65 of file parquetTable.py.

◆ _pandasMd

lsst.pipe.tasks.parquetTable.ParquetTable._pandasMd
protected

Definition at line 66 of file parquetTable.py.

◆ _pf

lsst.pipe.tasks.parquetTable.ParquetTable._pf
protected

Definition at line 64 of file parquetTable.py.

◆ columns

lsst.pipe.tasks.parquetTable.ParquetTable.columns

Definition at line 109 of file parquetTable.py.

◆ filename

lsst.pipe.tasks.parquetTable.ParquetTable.filename

Definition at line 62 of file parquetTable.py.


The documentation for this class was generated from the following file: