Skip to content

Latest commit

 

History

History
49 lines (30 loc) · 1.45 KB

File metadata and controls

49 lines (30 loc) · 1.45 KB

description: Apache Parquet Dataset.

Module: expression_impl.parquet

View source on GitHub

Apache Parquet Dataset.

Example usage:

  exp = create_expression_from_parquet_file(filenames)
  docid_project_exp = project.project(exp, [path.Path(["DocId"])])
  pqds = parquet_dataset.calculate_parquet_values([docid_project_exp], exp,
                                                  filenames, batch_size)

  for prensors in pqds:
    doc_id_prensor = prensors[0]

Classes

class ParquetDataset: A dataset which reads columns from a parquet file and returns a prensor.

Functions

calculate_parquet_values(...): Calculates expressions and returns a parquet dataset.

create_expression_from_parquet_file(...): Creates a placeholder expression from a parquet file.