Rewrite for Polar support by arnavk23 · Pull Request #647 · pytorch-tabular/pytorch_tabular

arnavk23 · 2026-03-02T09:41:25Z

This implementation provides the architectural foundation requested in #402, with initial support for Polars (including larger-than-memory datasets). Future work can add Spark and other backends following the same pattern.

- Create abstract DataBackend interface for supporting multiple dataframe libraries - Implement PandasBackend for backward compatibility with existing code - Implement PolarsBackend for faster processing and larger-than-memory datasets - Add automatic backend detection via get_backend() function - Support both eager and lazy evaluation modes in Polars This addresses issue pytorch-tabular#402 by providing the foundation for supporting Polars, Spark, and other dataframe frameworks in TabularDatamodule.

- Create TabularDatamoduleV2 class with automatic backend detection - Add Polars as optional dependency in pyproject.toml - Update __init__.py to export new classes and backend utilities - Add comprehensive example showing Polars usage patterns - Create detailed documentation for multi-backend support - Support both eager (DataFrame) and lazy (LazyFrame) Polars modes - Include sampling utility for transformer fitting on large datasets - Maintain full backward compatibility with existing pandas-based code This implementation provides: - 2-5x faster CSV reading with Polars - Better memory efficiency through Arrow memory format - Support for larger-than-memory datasets via LazyFrames - Foundation for future Spark and Dask backend support Addresses issue pytorch-tabular#402 - Re-write DataModule for larger-than-memory support

- Test PandasBackend operations (shape, columns, transforms, etc.) - Test PolarsBackend with both eager and lazy DataFrames - Test TabularDatamoduleV2 with pandas and polars backends - Test sampling utility for large dataset transform fitting - Add pytest skipif for optional polars dependency Tests verify: - Automatic backend detection - Backend operation correctness - Lazy loading capability flags - Integration with TabularDatamoduleV2

arnavk23 · 2026-03-02T09:44:30Z

@fkiraly @manujosephv whenever you are free, please review this pr. Thanks!

manujosephv · 2026-03-08T01:09:41Z

Thanks a lot @arnavk23 for the PR. I just skimmed through it and had a question. This PR doesnt enable lazy mode in Polars, right? just introduces a decoupled backend and adds Polars as an option?

arnavk23 · 2026-03-09T12:17:59Z

@manujosephv This PR does not implement end-to-end lazy training in Polars. It mainly:

introduces a decoupled backend abstraction
adds Polars as a supported backend option
accepts pl.LazyFrame inputs, but currently materializes them (collect() / to_pandas()) before the existing training pipeline runs.

manujosephv · 2026-03-09T16:43:28Z

@fkiraly Tagging you here since you are leading major dev in here..

arnavk23 added 3 commits March 2, 2026 15:03

arnavk23 changed the title ~~Polar implementation~~ Rewrite for Polar support Mar 2, 2026

arnavk23 added 2 commits March 2, 2026 15:12

Merge branch 'main' into feature/multi-backend-datamodule-402

1f7049b

Update multi_backend_example.py

0a678e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rewrite for Polar support#647

Rewrite for Polar support#647
arnavk23 wants to merge 5 commits intopytorch-tabular:mainfrom
arnavk23:feature/multi-backend-datamodule-402

arnavk23 commented Mar 2, 2026

Uh oh!

arnavk23 commented Mar 2, 2026 •

edited

Loading

Uh oh!

manujosephv commented Mar 8, 2026

Uh oh!

arnavk23 commented Mar 9, 2026

Uh oh!

manujosephv commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

arnavk23 commented Mar 2, 2026

Uh oh!

arnavk23 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manujosephv commented Mar 8, 2026

Uh oh!

arnavk23 commented Mar 9, 2026

Uh oh!

manujosephv commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arnavk23 commented Mar 2, 2026 •

edited

Loading