description: Apache Parquet Dataset.
View source on GitHub
|
Apache Parquet Dataset.
exp = create_expression_from_parquet_file(filenames)
docid_project_exp = project.project(exp, [path.Path(["DocId"])])
pqds = parquet_dataset.calculate_parquet_values([docid_project_exp], exp,
filenames, batch_size)
for prensors in pqds:
doc_id_prensor = prensors[0]
class ParquetDataset: A dataset which reads columns from a parquet file and returns a prensor.
calculate_parquet_values(...): Calculates expressions and returns a parquet dataset.
create_expression_from_parquet_file(...): Creates a placeholder expression from a parquet file.
View source on GitHub