Skip to content

Releases: pola-rs/polars

Python Polars 1.40.1

22 Apr 19:16
344a0ea

Choose a tag to compare

🚀 Performance improvements

  • Skip validity mask processing in __array_ufunc__ when no inputs have nulls (#27358)

✨ Enhancements

  • Cargo deny (#27363)
  • Add maintain_order parameter to merge_sorted (#27263)

🐞 Bug fixes

  • Honor having predicate in GroupBy iter (#27370)
  • Use the physical dtype for NumUnorderedImplodeReducer arrow ListArray (#27375)
  • Address bug in reduce_balanced for certain input length lists affecting pl.concat (#27352)
  • Ensure list.sample() allows fraction > 1 when with_replacement=True (#27350)
  • Ensure append() errors when upcast=False (#27346)
  • Always rechunk sorts, prune sorts even in eager execution (#27356)
  • Fix typing for DataFrame.__init__ and Series.__init__ so they don't require all optional dependencies to be installed (#27348)

📖 Documentation

  • Split out openlineage docs into guide and configuration (#27371)
  • Add explanation on the observatory sqlite db file (#27354)

🛠️ Other improvements

  • Disable mypy type checking for pyarrow calls (#27377)
  • Disable debug symbols in macos coverage tests (#27361)
  • Cargo deny (#27363)

Thank you to all our contributors for making this release possible!
@EndPositive, @Kevin-Patyk, @MarcoGorelli, @carnarez, @dsprenkels, @gab23r, @jonathanchang31, @kdn36, @mzjp2 and @ritchie46

Python Polars 1.40.0

18 Apr 05:26
bf6a425

Choose a tag to compare

🏆 Highlights

  • Add streaming support for grouped AsOf join (#27293)

⚠️ Deprecations

  • Deprecate support for dataframe interchange protocol (#27214)

🚀 Performance improvements

  • Create IR slice from expr slice pushdown (#27200)
  • Add streaming support for grouped AsOf join (#27293)
  • Avoid unnecessary rechunk when sorting already sorted DataFrame (#27264)
  • Lower basic over() to streaming primitives (#27303)
  • Lower drop_{nulls,nans} in streaming group_by aggregations (#27296)
  • Lower entropy to streaming reductions (#27174)
  • Add native streaming interpolate (#27185)
  • Streaming strptime with format=None (#27056)
  • Lower skew / kurtosis to streaming aggregations (#27176)
  • Post apply pyarrow filter in Polars' engine instead of pyarrow (#27192)
  • Optimize drop_nulls().{first,last}() to {first,last}(ignore_nulls=True) (#27187)
  • Always process pyarrow scan in batches (#27183)
  • Make cut output Enum and mark as elementwise (#27173)
  • Remove unused expression sorts (#27075)
  • Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
  • Take into account size per row in join sampling (#27098)
  • Streaming is_first_distinct and unique(maintain_order=True) (#27052)
  • Streaming cov and corr (#27008)
  • Add sorted unique node to streaming engine (#26990)
  • Ensure Expr.append is lowered in streaming engine (#27022)
  • Collapse consecutive Sort nodes (#26965)
  • Drop maintain_order=True requirement in sink_delta (#27007)

✨ Enhancements

  • Add ignore_nulls to {list,arr}.{any,all} (#27186)
  • Lock-free memory manager with spill-to-disk and fully OOC multiplexer (#26774)
  • Add is_unique to list/array dtypes (#27290)
  • Streaming pyarrow datasets sources (#27230)
  • Add pl.merge_sorted operating on multiple frames (#27014)
  • Allow group_by() without key exprs (#27141)
  • Change default scan/read_lines column name from "lines" to "line" (#27122)
  • Make unnest() effective on all columns by default (#27029)
  • Collapse consecutive Sort nodes (#26965)

🐞 Bug fixes

  • Update groups to correct length for Implode (#27282)
  • Fix scan_csv missing_columns='insert' overwrote existing data with NULLs (#27297)
  • Raise on non-numeric inputs in pl.int_ranges (#27294)
  • Fix always-true filter conversion to Iceberg filter (#27119)
  • Do not skip nulls when enumerating over rows in grouped AsOf join (#27275)
  • Fix pivot dropping data for null on values (#27273)
  • Resolve multiple files deadlock in CSV async reader (#27073)
  • Widen decimal precision on sum aggregation (#27270)
  • Correct lf.remote type (#27261)
  • Default LazyFrame.map_batches to no optimizations (#27262)
  • Extend StructEval schema context in StackOptimizer (#27243)
  • Preserve nulls when casting from all-null Series to Struct (#27241)
  • Fix scan_delta filter on empty dataframe (#27244)
  • Prevent DataFrame creation panic on list[struct] with heterogenous types (#27217)
  • Named aggregation __structify was being ignored (#27148)
  • Skip null group entries when collecting AsOf-by groups (#27215)
  • Fix panic with empty order_by in over expression (#27088)
  • Write field ID from sink_parquet (#27196)
  • Fix statistics for Null columns in Parquet (#27021)
  • Do not prune sort nodes containing slice with dyn predicate (#27140)
  • Correct grouped Binary arg_min/arg_max and String single-element arg indices (#27172)
  • Resolve multiple files deadlock in NDJSON async reader (#27204)
  • Overflow panic in interpolate nearest (#27205)
  • Using checked arithmetic in int96_to_i64_ns to prevent overflow panic (#27129)
  • Don't trigger csv fast count if predicate is pushed down (#27190)
  • Support all integer dtypes for Series index assignment (#27188)
  • Streaming sort by-expressions were lowered incorrectly (#27158)
  • Replace multiprocessing.dummy.Pool with ThreadPoolExecutor (#27175)
  • Reset IO metrics instead of consuming (#27156)
  • Output SVG if output_path ends with '.svg' in show_graph (#27144)
  • Skip extension types for min/max in describe (#27120)
  • Address a potential overflow in from_epoch scaling (#27118)
  • Fix incorrect IO metrics on multi-phase streaming execution (#27123)
  • Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
  • Make the files used in docs available locally (#27121)
  • Apply scalar bound in clip when the Series bound contains nulls (#27087)
  • Ignore ddof parameter in rolling_corr and deprecate (#27104)
  • Preserve casts for horizontal ops with untyped literals (#27011)
  • Reject invalid input to sql_expr (#27084)
  • Ensure SQL COUNT(<lit>) expressions return the correct value (#27085)
  • Regression in replace_strict for enums (#27066)
  • Make test_group_by_arg_max_boolean_26978 non-flaky for max_by ties (#27048)
  • Null count for aggregated list inside count aggregation (#27032)
  • Panic in streaming MergeSortedNode (#27024)
  • Prevent panic in transpose() with mixed List and non-List columns (#27038)
  • Set sorted flag for Boolean and Time (#27035)
  • Missing src/ subdirectory to CI Python docs step (#27025)
  • Resolve stack overflow on merge_sorted and union (#27018)
  • Make pl.DataFrame.fill_null work on columns with Null dtype (#27020)
  • Fix repeated word typos in comments (#26917)
  • Covariance with constant is zero, not NaN (#27015)
  • Don't remove set_sorted in projection pushdown (#27006)
  • Infer nulls when df create from empty-struct (#26991)
  • Correct suggestion in multi-expr filter error (#27003)
  • Implement agg_arg_min/agg_arg_max for boolean data type (#26997)
  • Ensure sample() respects the global set seed (#26992)

📖 Documentation

  • Add documentation for openlineage on-premises (#27334)
  • Release page (#27335)
  • Update uv pip install polars-on-premises cmd (#27330)
  • Fix outdated LazyGroupBy.map_groups docstring (#27292)
  • Add deny_anonymous_users to scheduler config (#27287)
  • Slurm documentation (#27259)
  • Add link to concepts in index.md (#27077)
  • Add docs entry for merge_sorted (#27224)
  • Fix typo (#27212)
  • Make the files used in docs available locally (#27121)
  • Put first-time contribution requirements in its own linkable section (#27113)
  • Add missing docstrings for Expr.struct.__getitem__ and Series.__setitem__ (#27092)
  • Normalise Series docstring whitespace indents (#27082)
  • Change Polars Cloud API to 0.6.0 (#27005)
  • Improve write_parquet docstring for use_pyarrow (#26988)

📦 Build system

  • Really do not install pyiceberg-core 0.9.0 (#27017)

🛠️ Other improvements

  • Add regression test for instantiating polars DataFrame from pandas Timestamp (#27332)
  • Bump Python Polars version (#27315)
  • Resolve bad instantiations in test_iceberg (#27314)
  • Sink DSL and callback for Iceberg (#27258)
  • Wait for morsel consumption in merge_sorted streaming node (#27288)
  • Use more precise internal typing (pt. iii) (#27232)
  • Mark scan_ipc cache arguments as deprecated (#27216)
  • Consolidate reordered compare functions (#27229)
  • Fix test_dtype_concat_3735 not actually iterating through numeric dtypes (#27178)
  • Remove dead code in test_scan_lines (#27213)
  • Move/genericize _balanced_reduce to Python utils (#27100)
  • Remove unused attributes (#27191)
  • Avoid unnecessary recompilation due to changing env vars (#27166)
  • Update nightly Rust compiler version (#27145)
  • Simplify pyarrow scan and process in batches (#26982)
  • Make internal typing more precise (part ii) (#27117)
  • Add None & Dataframe to FrameInitTypes (#27126)
  • Remove unused expression sorts (#27075)
  • Improve internal typing ahead of using ty / pyrefly (#27050)
  • Add explicit ResourceWarning coverage (#27083)
  • Add sinked paths callback (#26995)
  • Pin maturin due to compile time regression (#27062)
  • Missing src/ subdirectory to CI Python docs step (#27025)
  • Really do not install pyiceberg-core 0.9.0 (#27017)
  • Naming for named scopes (#26999)
  • Enable hypothesis tests when POLARS_AUTO_NEW_STREAMING=1 (#26818)
  • Fix CI by excluding missing wheel version of pyiceberg (#27001)
  • Remove indirection in calling python scans (#26981)
  • Polars versions (#26980)

Thank you to all our contributors for making this release possible!
@0xRozier, @EndPositive, @HCYT, @Kevin-Patyk, @MarcoGorelli, @NeejWeej, @RedZapdos123, @TNieuwdorp, @abhidotsh, @alexander-beedie, @andyjessen, @azimafroozeh, @borchero, @carnarez, @coastalwhite, @debnathshoham, @dpinol, @dsprenkels, @dydev012, @farouk-01, @gab23r, @gautamvarmadatla, @joaquinhuigomez, @kdn36, @nameexhaustion, @orlp, @ritchie46, @wence-, @xenzh, @yangsong97 and @yonatan-genai

Python Polars 1.39.3

20 Mar 11:16

Choose a tag to compare

  • No changes

Thank you to all our contributors for making this release possible!
@ritchie46

Python Polars 1.39.2

17 Mar 17:19

Choose a tag to compare

  • No changes

Thank you to all our contributors for making this release possible!
@nameexhaustion and @ritchie46

Python Polars 1.39.1

17 Mar 09:40

Choose a tag to compare

🐞 Bug fixes

  • Handle empty rolling windows in streaming engine (#26903)

📖 Documentation

  • Add documentation for on_columns for LazyFrame pivot (#26859)

🛠️ Other improvements

  • Bump build deps used in ARM64 Windows release pipeline (#26892)

Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @RenzoMXD, @TNieuwdorp, @dsprenkels, @gautamvarmadatla, @nameexhaustion, @nicholaslegrand102 and @ritchie46

Python Polars 1.39.0

12 Mar 14:25
2bce04a

Choose a tag to compare

🚀 Performance improvements

  • Lower arg_{min,max} to streaming engine (#26845)
  • Additional IR slice pushdown after filter pushdown (#26815)
  • Streaming first/last on Enum through physical (#26783)
  • Fast filter for scalar predicates (#26745)
  • Allow SimpleProjection in streaming engine to rename (#26709)
  • Streaming cloud download for scan_csv (#26637)
  • Drop columns only needed for predicates after the predicate is applied (#26703)
  • Run projection pushdown after predicate pushdown (#26688)
  • Comparison literal downcasting (#26663)
  • Add dynamic predicates for TopK (#26495)
  • Increase minimum default parquet row group prefetch to 8 (#26632)
  • Partial predicate conversion to PyArrow (#26567)
  • Streaming cloud download for scan_ndjson / scan_lines (#26563)
  • Grab GIL fewer times during Object join materialization (#26587)
  • Improve CSV and NDJSON cloud sink performance (#26545)
  • Tune cloud writer performance (#26518)
  • Allow parallel InMemorySinks in streaming engine (#26501)
  • Add streaming AsOf join node (#26398)
  • Don't always rechunk on gather of nested types (#26478)

✨ Enhancements

  • Support Expr for holidays in business day calculations (#26193)
  • Parameter for pivot to always include value column name (#26730)
  • Raise error in .collect_schema() when arr.get() is out-of-bounds (#26866)
  • Extend Expr.reinterpret to all numeric types of the same size (#26401)
  • Add missing_columns parameter to scan_csv (#26787)
  • Clear no-op scan projections (#26858)
  • Support nested datatypes for {min,max}_by (#26849)
  • Support SQL ARRAY init from typed literals (#26622)
  • Accept table identifier string in scan_iceberg() (#26826)
  • Add a convenience make fresh command to the Makefile (#26809)
  • Expose "use_zip64" Workbook option for write_excel (#26699)
  • Add unstable LazyFrame.sink_iceberg (#26799)
  • Add maintain order argument on implode (#26782)
  • Speed up casting primitive to bool by at least 2x (#26823)
  • Support ASCII format table input to pl.from_repr (#26806)
  • Enable rowgroup skipping for float columns (#26805)
  • Add expression context to errors (#26716)
  • Add Decimal support for product reduction (#26725)
  • Support all Iceberg V2 arrow types in sink_parquet arrow_schema parameter (#26669)
  • Re-work behavior of arrow_schema parameter on sink_parquet (#26621)
  • Add contains_dtype() method for Schema (#26661)
  • Implement truncate as a "to_zero" rounding mode (#26677)
  • More generic streaming GroupBy lowering (#26696)
  • Create an Alignment TypeAlias (#26668)
  • Add basic MemoryManager to track buffered dataframes for out-of-core support later (#26443)
  • Add truncate Expression for numeric values (#26666)
  • Better error messages for hex literal conversion issues in the SQL interface (#26657)
  • Add SQL support for LPAD and RPAD string functions (#26631)
  • Support SQL "FROM-first" SELECT query syntax (#26598)
  • Improve base_type typing (#26602)
  • Bump Chrono to 0.4.24, enabling stricter parsing of %.3f/%.6f/%.9f specifiers (#26075)
  • Expose unstable assert_schema_equal in py-polars (#24869)
  • Allow parsing of compact ISO 8601 strings (#24629)
  • Add optional "label" param to DataFrame corr (#26588)
  • Streaming cloud download for scan_ndjson / scan_lines (#26563)
  • Configuration to cast integers to floats in cast_options for scan_parquet (#26492)
  • Add escaping to quotes and newlines when reading JSON object into string (#26578)
  • Standardise on RFC-5545 when doing datetime arithmetic on timezone-aware datetimes (#26425)
  • Support sas_token in Azure credential provider (#26565)
  • Relax SQL requirement for derived tables and subqueries to have aliases (#26543)
  • Add polars-config and pl.Config.reload_env_vars() (#26524)
  • Record path for object store error raised from sinks (#26541)
  • Use CRC64NVME for checksum in aws sinks (#26522)
  • Add get() for binary Series (#26514)
  • Add streaming AsOf join node (#26398)
  • Add primitive filter -> agg lowering in streaming GroupBy (#26459)
  • Support for the SQL FETCH clause (#26449)

🐞 Bug fixes

  • Prevent Boolean arithmetic with integer literals producing Unknown type in streaming engine (#26878)
  • Fix sink to partitioned S3 from Windows corrupted slashes (#26889)
  • Remove outdated warning about List columns in unique() (#26295) (#26890)
  • Restore pyarrow predicate conversion for is_in (#26811)
  • Release GIL before df.to_ndarray() to avoid deadlock (#26832)
  • Fix panic on CSV count_rows with FORCE_ASYNC (#26883)
  • Add scalar comparisons for UInt128 series (#26886)
  • Fix shape error not raised for 0 width inputs with non-0 height for streaming horizontal concat (#26877)
  • Fix streaming zip-broadcast node did not raise shape mismatch on empty recv from ready port (#26871)
  • Fix incorrect output list.eval with scalar expr, fix panic on list.agg with nulls (#26868)
  • Allow list argument in group_by().map_groups() (#26707)
  • Support for ADBC drivers instantiated with dbc in DataFrame.write_database (#26157)
  • Incorrect arg_sort with descending+limit (#26839)
  • Raise error in .collect_schema() when arr.get() is out-of-bounds (#26866)
  • Return ComputeError instead of panicking in map_groups UDF (#26665)
  • Issue PerformanceWarning in LazyFrame.__contains__ (#26734)
  • Correct type hint for map_columns function parameter (#26487)
  • Apply thousands_separator to count/null_count in describe() for non-numeric columns (#26486)
  • Ensure proper handling of timedelta when multiplying with a Series (#26830)
  • Correct type hint for function parameter in DataFrame.map_columns (#26372)
  • Segfault in JoinExec on deep plan (#26796)
  • Fix unary expressions on literal in over context (#26827)
  • Fix {min,max}_by in streaming engine for Boolean full {min,max} value column (#26848)
  • Fix debug panic on clip with nan bound (#26854)
  • Support grouped {arg_,}_{min,max} for Categoricals (#26856)
  • Throw an error if a string is passed to LazyFrame.pivot on_columns (#26852)
  • Preserve input float precision in rolling_cov() and rolling_corr() with mixed input types (#26820)
  • Preserve row count when converting zero-column DataFrame via arrow PyCapsule interface (#26835)
  • Prevent infinite recursion in streaming group_by fallback (#26801)
  • Use RowEncodingContext::Struct when determining D::Struct encoded item len (#26817)
  • Incorrectly applied CSE on different map_batches functions (#26822)
  • Fix duplicated query execution on todo panic when combining collect(engine='streaming') with POLARS_AUTO_NEW_STREAMING (#26792)
  • Prevent predicate pushdown across Sort with baked-in slice (#26804)
  • Restore compatibility with pd.Timedelta (#26785)
  • Fix panic on lazy sink_parquet created in pipe_with_schema (#26784)
  • Support {column_name} and {index} placeholders in pl.format string (#26771)
  • Do not use merge-join if nulls_last is unknown (#26778)
  • Normalize float zeros in Parquet column statistics (#26776)
  • Fix out-of-bounds for positive offset in windowed rolling (#26724)
  • Raise error when .get() is out-of-bounds in group by context (#26752)
  • Boolean bitwise_xor aggregation inverted when column contains nulls (#26749)
  • Parameter nulls_last was ignored in over (#26718)
  • Allow missing time in inexact strptime (#26714)
  • Respect nulls_last in sort_by within group_by().agg() slow path (#26681)
  • Return NaN when using corr() with a literal and expr (#26697)
  • Allow strict horizontal concat with empty df (#26345)
  • Fix PoisonError panic caused by reentrant usage of file cache (#26627)
  • Return null for int values exceeding 128-bit range with strict=False (#26674)
  • Incorrect boolean min/max with nulls (#26671)
  • Slice-slice pushdown for n_rows (#26673)
  • Resolve panic in Enum struct slicing (#26643)
  • Fix CSPE for group_by.map_groups (#26640)
  • Remove non-existent parameter from SQLContext typing overloads (#26658)
  • Address pl.from_epoch losing fractional seconds (#26419)
  • Fix to_pandas() on empty enum Series did not preserve enum dictionary (#26610)
  • Rounding behaviour for f32 values with "HalfAwayFromZero" mode (#26624)
  • Updated Sum Type Hint (#26629)
  • Don't allow namespace registration to override standard methods or properties (#26450)
  • Correct arg_(min|max) for scalar columns (#26609)
  • Use monkeypatch.chdir in test_sink_path_slicing_utf8_boundaries_26324 (#26616)
  • Respect SQL semantics for cumulative functions mapped via OVER clause (#26570)
  • Fix incorrect multiplexer output ordering on source token stop request (#26561)
  • Fix PyIceberg filter on boolean column (#26550)
  • Set dictionary_page_offset when dictionary encoding is used and point data_page_offset to the first data page (#26542)
  • Move query parameters to request body when retrieving Unity Catalog temporary credentials (#26539)
  • Ensure read_csv_batched() prints deprecation warning (#26530)
  • Implement PhysicalExpr for MinBy/MaxBy nodes (#26506)
  • Refactor row-encoding logic in IR join lowering into separate function (#26512)
  • Correctly check for path extensions (#26513)
  • Change AsOf join to be based on TotalOrd (#26497)
  • Correctly raise error on failing nested strict casts (#26499)
  • Prevent invalid type casts in replace_strict() (#26453)
  • Return null when dividing literals by 0 (#26343)
  • Fix type-hint for Series.quantile (#26422)

📖 Documentation

  • Mention ComputeContexts create ephemeral environments by default and hint at re-use (#26692)
  • Remove confusing join validation note (#26795)
  • Fix formatting in categorical documentation (#26746)
  • Fix broken AI policy link (#26728)
  • Create Polars Cloud Glossary (#26690)
  • Additional SQL documentation (#26662)
  • Include invalidate_caches in bisect instructions (#26641)
  • Add git bisect guide to contributing docs (#26634)
  • Fix Polars Cloud examples (formatting & type hints) (#26625)
  • Updated Airflow orchestration documentation (#26585)
  • Improve SQL docs for...
Read more

Rust Polars 0.53.0

09 Feb 09:16
16c0d99

Choose a tag to compare

🏆 Highlights

  • Add Extension types (#25322)

🚀 Performance improvements

  • Don't always rechunk on gather of nested types (#26478)
  • Enable zero-copy object_store put upload for IPC sink (#26288)
  • Resolve file schema's and metadata concurrently (#26325)
  • Run elementwise CSEE for the streaming engine (#26278)
  • Disable morsel splitting for fast-count on streaming engine (#26245)
  • Implement streaming decompression for scan_ndjson and scan_lines (#26200)
  • Improve string slicing performance (#26206)
  • Refactor scan_delta to use python dataset interface (#26190)
  • Add dedicated kernel for group-by arg_max/arg_min (#26093)
  • Add streaming merge-join (#25964)
  • Generalize Bitmap::new_zeroed opt for Buffer::zeroed (#26142)
  • Reduce fs stat calls in path expansion (#26173)
  • Lower streaming group_by n_unique to unique().len() (#26109)
  • Speed up SQL interface "UNION" clauses (#26039)
  • Speed up SQL interface "ORDER BY" clauses (#26037)
  • Add fast kernel for is_nan and use it for numpy NaN->null conversion (#26034)
  • Optimize ArrayFromIter implementations for ObjectArray (#25712)
  • New streaming NDJSON sink pipeline (#25948)
  • New streaming CSV sink pipeline (#25900)
  • Dispatch partitioned usage of sink_* functions to new-streaming by default (#25910)
  • Replace ryu with faster zmij (#25885)
  • Reduce memory usage for .item() count in grouped first/last (#25787)
  • Skip schema inference if schema provided for scan_csv/ndjson (#25757)
  • Add width-aware chunking to prevent degradation with wide data (#25764)
  • Use new sink pipeline for write/sink_ipc (#25746)
  • Reduce memory usage when scanning multiple parquet files in streaming (#25747)
  • Don't call cluster_with_columns optimization if not needed (#25724)
  • Tune partitioned sink_parquet cloud performance (#25687)
  • New single file IO sink pipeline enabled for sink_parquet (#25670)
  • New partitioned IO sink pipeline enabled for sink_parquet (#25629)
  • Correct overly eager local predicate insertion for unpivot (#25644)
  • Reduce HuggingFace API calls (#25521)
  • Use strong hash instead of traversal for CSPE equality (#25537)
  • Fix panic in is_between support in streaming Parquet predicate push down (#25476)
  • Faster kernels for rle_lengths (#25448)
  • Allow detecting plan sortedness in more cases (#25408)
  • Enable predicate expressions on unsigned integers (#25416)
  • Mark output of more non-order-maintaining ops as unordered (#25419)
  • Fast find start window in group_by_dynamic with large offset (#25376)
  • Add streaming native LazyFrame.group_by_dynamic (#25342)
  • Add streaming sorted Group-By (#25013)
  • Add parquet prefiltering for string regexes (#25381)
  • Use fast path for agg_min/agg_max when nulls present (#25374)
  • Fuse positive slice into streaming LazyFrame.rolling (#25338)
  • Mark Expr.reshape((-1,)) as row separable (#25326)
  • Use bitmap instead of Vec<bool> in first/last w. skip_nulls (#25318)
  • Return references from aexpr_to_leaf_names_iter (#25319)

✨ Enhancements

  • Add primitive filter -> agg lowering in streaming GroupBy (#26459)
  • Support for the SQL FETCH clause (#26449)
  • Add get() to retrieve a byte from binary data (#26454)
  • Remove with_context in SQL lowering (#26416)
  • Avoid OOM for scan_ndjson and scan_lines if input is compressed and negative slice (#26396)
  • Add JoinBuildSide (#26403)
  • Support annoymous agg in-mem (#26376)
  • Add unstable arrow_schema parameter to sink_parquet (#26323)
  • Improve error message formatting for structs (#26349)
  • Remove parquet field overwrites (#26236)
  • Enable zero-copy object_store put upload for IPC sink (#26288)
  • Improved disambiguation for qualified wildcard columns in SQL projections (#26301)
  • Expose upload_concurrency through env var (#26263)
  • Allow quantile to compute multiple quantiles at once (#25516)
  • Allow empty LazyFrame in LazyFrame.group_by(...).map_groups (#26275)
  • Use delta file statistics for batch predicate pushdown (#26242)
  • Add streaming UnorderedUnion (#26240)
  • Implement compression support for sink_ndjson (#26212)
  • Add unstable record batch statistics flags to {sink/scan}_ipc (#26254)
  • Cloud retry/backoff configuration via storage_options (#26204)
  • Use same sort order for expanded paths across local / cloud / directory / glob (#26191)
  • Expose physical plan NodeStyle (#26184)
  • Add streaming merge-join (#25964)
  • Serialize optimization flags for cloud plan (#26168)
  • Add compression support to write_csv and sink_csv (#26111)
  • Add scan_lines (#26112)
  • Support regex in str.split (#26060)
  • Add unstable IPC Statistics read/write to scan_ipc/sink_ipc (#26079)
  • Add nulls support for all rolling_by operations (#26081)
  • ArrowStreamExportable and sink_delta (#25994)
  • Release musl builds (#25894)
  • Implement streaming decompression for CSV COUNT(*) fast path (#25988)
  • Add nulls support for rolling_mean_by (#25917)
  • Add lazy collect_all (#25991)
  • Add streaming decompression for NDJSON schema inference (#25992)
  • Improved handling of unqualified SQL JOIN columns that are ambiguous (#25761)
  • Expose record batch size in {sink,write}_ipc (#25958)
  • Add null_on_oob parameter to expr.get (#25957)
  • Suggest correct timezone if timezone validation fails (#25937)
  • Support streaming IPC scan from S3 object store (#25868)
  • Implement streaming CSV schema inference (#25911)
  • Support hashing of meta expressions (#25916)
  • Improve SQLContext recognition of possible table objects in the Python globals (#25749)
  • Add pl.Expr.(min|max)_by (#25905)
  • Improve MemSlice Debug impl (#25913)
  • Implement or fix json encode/decode for (U)Int128, Categorical, Enum, Decimal (#25896)
  • Expand scatter to more dtypes (#25874)
  • Implement streaming CSV decompression (#25842)
  • Add Series sql method for API consistency (#25792)
  • Mark Polars as safe for free-threading (#25677)
  • Support Binary and Decimal in arg_(min|max) (#25839)
  • Allow Decimal parsing in str.json_decode (#25797)
  • Add shift support for Object data type (#25769)
  • Add node status to NodeMetrics (#25760)
  • Allow scientific notation when parsing Decimals (#25711)
  • Allow creation of Object literal (#25690)
  • Don't collect schema in SQL union processing (#25675)
  • Add bin.slice(), bin.head(), and bin.tail() methods (#25647)
  • Add SQL support for the QUALIFY clause (#25652)
  • New partitioned IO sink pipeline enabled for sink_parquet (#25629)
  • Add SQL syntax support for CROSS JOIN UNNEST(col) (#25623)
  • Add separate env var to log tracked metrics (#25586)
  • Expose fields for generating physical plan visualization data (#25562)
  • Allow pl.Object in pivot value (#25533)
  • Extend SQL UNNEST support to handle multiple array expressions (#25418)
  • Minor improvement for as_struct repr (#25529)
  • Temporal quantile in rolling context (#25479)
  • Add support for Float16 dtype (#25185)
  • Add strict parameter to pl.concat(how='horizontal') (#25452)
  • Add leftmost option to str.replace_many / str.find_many / str.extract_many (#25398)
  • Add quantile for missing temporals (#25464)
  • Expose and document pl.Categories (#25443)
  • Support decimals in search_sorted (#25450)
  • Use reference to Graph pipes when flushing metrics (#25442)
  • Add SQL support for named WINDOW references (#25400)
  • Add Extension types (#25322)
  • Add having to group_by context (#23550)
  • Allow elementwise Expr.over in aggregation context (#25402)
  • Add SQL support for ROW_NUMBER, RANK, and DENSE_RANK functions (#25409)
  • Automatically Parquet dictionary encode floats (#25387)
  • Add empty_as_null and keep_nulls to {Lazy,Data}Frame.explode (#25369)
  • Allow hash for all List dtypes (#25372)
  • Support unique_counts for all datatypes (#25379)
  • Add maintain_order to Expr.mode (#25377)
  • Display function of streaming physical plan map node (#25368)
  • Allow slice on scalar in aggregation context (#25358)
  • Allow implode and aggregation in aggregation context (#25357)
  • Add empty_as_null and keep_nulls flags to Expr.explode (#25289)
  • Add ignore_nulls to first / last (#25105)
  • Move GraphMetrics into StreamingQuery (#25310)
  • Allow Expr.unique on List/Array with non-numeric types (#25285)
  • Allow Expr.rolling in aggregation contexts (#25258)
  • Support additional forms of SQL CREATE TABLE statements (#25191)
  • Add LazyFrame.pivot (#25016)
  • Support column-positional SQL UNION operations (#25183)
  • Allow arbitrary expressions as the Expr.rolling index_column (#25117)
  • Allow arbitrary Expressions in "subset" parameter of unique frame method (#25099)
  • Support arbitrary expressions in SQL JOIN constraints (#25132)

🐞 Bug fixes

  • Do not overwrite used names in cluster_with_columns pushdown (#26467)
  • Do not mark output of concat_str on multiple inputs as sorted (#26468)
  • Fix CSV schema inference content line duplication bug (#26452)
  • Fix InvalidOperationError using scan_delta with filter (#26448)
  • Alias giving missing column after streaming GroupBy CSE (#26447)
  • Ensure by_name selector selects only names (#26437)
  • Restore compatibility of strings written to parquet with pyarrow filter (#26436)
  • Update schema in cluster_with_columns optimization (#26430)
  • Fix negative slice in groups slicing (#26442)
  • Don't run CPU check on aarch64 musl (#26439)
  • Remove the POLARS_IDEAL_MORSEL_SIZE monkeypatching in the parametric merge-join test (#26418)
  • Correct off-by-one in RLE row counting for nullable dictionary-encoded columns (#26411)
  • Support very large integers in env var limits (#26399)
  • Fix PlPath panic from incorrect slicing of UTF8 boundaries (#26389)
  • Fix Float dtype for spearman correlation (#26392)
  • Fix optimizer panic in right joins with type coercion (#26365)
  • Don't serialize retry config from ...
Read more

Python Polars 1.38.1

06 Feb 18:13
50a3bfb

Choose a tag to compare

✨ Enhancements

  • Add get() to retrieve a byte from binary data (#26454)
  • Remove with_context in SQL lowering (#26416)

🐞 Bug fixes

  • Do not overwrite used names in cluster_with_columns pushdown (#26467)
  • Do not mark output of concat_str on multiple inputs as sorted (#26468)
  • Fix CSV schema inference content line duplication bug (#26452)
  • Fix InvalidOperationError using scan_delta with filter (#26448)
  • Alias giving missing column after streaming GroupBy CSE (#26447)
  • Ensure by_name selector selects only names (#26437)
  • Restore compatibility of strings written to parquet with pyarrow filter (#26436)
  • Update schema in cluster_with_columns optimization (#26430)
  • Fix negative slice in groups slicing (#26442)
  • Don't run CPU check on aarch64 musl (#26439)
  • Fixed annotations shadowed by class methods (#26356)
  • Remove the POLARS_IDEAL_MORSEL_SIZE monkeypatching in the parametric merge-join test (#26418)
  • Fix selector match patterns for multiline column names (#26320)

📖 Documentation

  • Add sink_delta to API reference (#26446)

🛠️ Other improvements

  • Cleanup unused attributes in optimizer (#26464)
  • Use Expr::Display as catch all for IR - DSL asymmetry (#26471)
  • Ignore pytz in mypy (#26441)
  • Remove the POLARS_IDEAL_MORSEL_SIZE monkeypatching in the parametric merge-join test (#26418)
  • Cleanup the parametric merge-join test (#26413)

Thank you to all our contributors for making this release possible!
@Voultapher, @alexander-beedie, @azimafroozeh, @cmdlineluser, @dependabot[bot], @dsprenkels, @hamdanal, @kdn36, @nameexhaustion, @orlp, @ritchie46 and dependabot[bot]

Python Polars 1.38.0

04 Feb 12:01
e1612c2

Choose a tag to compare

⚠️ Deprecations

  • Deprecate retries=n in favor of storage_options={"max_retries": n} (#26155)

🚀 Performance improvements

  • Enable zero-copy object_store put upload for IPC sink (#26288)
  • Resolve file schema's and metadata concurrently (#26325)
  • Run elementwise CSEE for the streaming engine (#26278)
  • Disable morsel splitting for fast-count on streaming engine (#26245)
  • Implement streaming decompression for scan_ndjson and scan_lines (#26200)
  • Improve string slicing performance (#26206)
  • Refactor scan_delta to use python dataset interface (#26190)
  • Add dedicated kernel for group-by arg_max/arg_min (#26093)
  • Add streaming merge-join (#25964)
  • Generalize Bitmap::new_zeroed opt for Buffer::zeroed (#26142)
  • Reduce fs stat calls in path expansion (#26173)
  • Lower streaming group_by n_unique to unique().len() (#26109)

✨ Enhancements

  • Avoid OOM for scan_ndjson and scan_lines if input is compressed and negative slice (#26396)
  • Support annoymous agg in-mem (#26376)
  • Add unstable arrow_schema parameter to sink_parquet (#26323)
  • Improve error message formatting for structs (#26349)
  • Remove parquet field overwrites (#26236)
  • Enable zero-copy object_store put upload for IPC sink (#26288)
  • Improved disambiguation for qualified wildcard columns in SQL projections (#26301)
  • Expose upload_concurrency through env var (#26263)
  • Allow quantile to compute multiple quantiles at once (#25516)
  • Allow empty LazyFrame in LazyFrame.group_by(...).map_groups (#26275)
  • Use delta file statistics for batch predicate pushdown (#26242)
  • Add streaming UnorderedUnion (#26240)
  • Implement compression support for sink_ndjson (#26212)
  • Add unstable record batch statistics flags to {sink/scan}_ipc (#26254)
  • Support CSE for python UDFs on the same address (#26253)
  • Cloud retry/backoff configuration via storage_options (#26204)
  • Use same sort order for expanded paths across local / cloud / directory / glob (#26191)
  • Add streaming merge-join (#25964)
  • Serialize optimization flags for cloud plan (#26168)
  • Add compression support to write_csv and sink_csv (#26111)
  • Add scan_lines (#26112)
  • Support regex in str.split (#26060)
  • Add unstable IPC Statistics read/write to scan_ipc/sink_ipc (#26079)
  • Add unstable height parameter to DataFrame/LazyFrame (#26014)
  • Remove old partition sink API (#26100)
  • Expose ArrowStreamExportable on python collect batches iterator (#26074)
  • Add nulls support for all rolling_by operations (#26081)

🐞 Bug fixes

  • Correct off-by-one in RLE row counting for nullable dictionary-encoded columns (#26411)
  • Support very large integers in env var limits (#26399)
  • Fix PlPath panic from incorrect slicing of UTF8 boundaries (#26389)
  • Fix Float dtype for spearman correlation (#26392)
  • Fix optimizer panic in right joins with type coercion (#26365)
  • Don't serialize retry config from local environment vars (#26289)
  • Fix PartitionBy with scalar key expressions and diff() (#26370)
  • Add {Float16, Float32} -> Float32 lossless upcast (#26373)
  • Fix panic using with_columns and collect_all (#26366)
  • Add multi-page support for writing dictionary-encoded Parquet columns (#26360)
  • Ensure slice advancement when skipping non-inlinable values in is_in with inlinable needles (#26361)
  • Pin xlsx2csv version temporarily (#26352)
  • Bugs in ViewArray total_bytes_len (#26328)
  • Overflow in i128::abs in Decimal fits check (#26341)
  • Make Expr.hash on Categorical mapping-independent (#26340)
  • Clone shared GroupBy node before mutation in physical plan creation (#26327)
  • Fixed "sheet_name" typing for read_ods and read_excel (#26317)
  • Improve Polars dtype inference from Python Union typing (#26303)
  • Consider the "current location" of an item when computing rolling_rank_by (#26287)
  • Reset is_count_star flag between queries in collect_all (#26256)
  • Fix incorrect is_between filter on scan_parquet (#26284)
  • Make polars compatible with ty (#26270)
  • Lower AnonymousStreamingAgg in group-by as aggregate (#26258)
  • Avoid overflow in pl.duration scalar arguments case (#26213)
  • Broadcast arr.get on single array with multiple indices (#26219)
  • Fix panic on CSPE with sorts (#26231)
  • Eager DataFrame.slice with negative offset and length=None (#26215)
  • Use correct schema side for streaming merge join lowering (#26218)
  • Overflow panic in scan_csv with multiple files and skip_rows + n_rows larger than total row count (#26128)
  • Respect allow_object flag after cache (#26196)
  • Raise error on non-elementwise PartitionBy keys (#26194)
  • Allow ordered categorical dictionary in scan_parquet (#26180)
  • Allow excess bytes on IPC bitmap compressed length (#26176)
  • Address a macOS-specific compile issue (#26172)
  • Fix deadlock on hash_rows() of 0-width DataFrame (#26154)
  • Fix NameError filtering pyarrow dataset (#26166)
  • Fix concat_arr panic when using categoricals/enums (#26146)
  • Fix NDJSON/scan_lines negative slice splitting with extremely long lines (#26132)
  • Incorrect group_by min/max fast path (#26139)
  • Remove a source of non-determinism from lowering (#26137)
  • Error when with_row_index or unpivot create duplicate columns on a LazyFrame (#26107)
  • Panics on shift with head (#26099)

📖 Documentation

  • Fix Expr.get referencing incorrect dtype for index parameter (#26364)
  • Fix Expr.quantile formatting (#26351)
  • Drop sphinx-llms-txt extension (#26285)
  • Remove deprecated cublet_id (#26260)
  • Update for new release (#26255)
  • Update MCP server section with new URL (#26241)
  • Fix unmatched paren and punctuation in pandas migration guide (#26251)
  • Add observatory database_path to docs (#26201)
  • Note plugins in Python user-defined functions (#26138)

📦 Build system

  • Address remaining Python 3.14 issues with make requirements-all (#26195)
  • Address a macOS-specific compile issue (#26172)

🛠️ Other improvements

  • Ensure local doctests skip from_torch if module not installed (#26405)
  • Change linked timezones in test suite to canonical timezones (#26310)
  • Implement various deprecations (#26314)
  • Rename Operator::Divide to RustDivide (#26339)
  • Properly disable the Pyodide tests (#26382)
  • Remove unused field (#26367)
  • Fix runtime nesting (#26359)
  • Remove xlsx2csv dependency pin (#26355)
  • Use outer runtime if exists in to_alp (#26353)
  • Make CategoricalMapping::new pub(crate) to avoid misuse (#26308)
  • Clarify IPC buffer read limit/length paramter (#26334)
  • Add dtype test coverage for delta predicate filter (#26291)
  • Add AI policy (#26286)
  • Unpin "pandas<3" in dev dependencies (#26249)
  • Remove all non CSV fast-count paths (#26233)
  • Pin pandas to 2.x for now (#26221)
  • Remove unnecessary xfail (#26199)
  • Ensure optimization flag modification happens local (#26185)
  • Simplify IcebergDataset (#26165)
  • Reorganize unit tests into logical subdirectories (#26149)
  • Lint leftover fixme (#26122)
  • Improve backtrace for POLARS_PANIC_ON_ERR (#26125)
  • Fix Python docs build (#26117)
  • Disable unused-ignore mypy lint (#26110)
  • Ignore mypy warning (#26105)
  • Raise error on file://hostname/path (#26061)
  • Disable debug info for docs workflow (#26086)
  • Update docs for next polars cloud release (#26091)
  • Support Python 3.14 in dev environment (#26073)

Thank you to all our contributors for making this release possible!
@Atarust, @EndPositive, @Kevin-Patyk, @LeeviLindgren, @MarcoGorelli, @Matt711, @MrAttoAttoAtto, @Voultapher, @WaffleLapkin, @agossard, @alex-gregory-ds, @alexander-beedie, @azimafroozeh, @bayoumi17m, @c-peters, @carnarez, @dependabot[bot], @dsprenkels, @hallmason17, @hamdanal, @ion-elgreco, @kdn36, @lun3x, @mcrumiller, @nameexhaustion, @orlp, @qxzcode, @r-brink, @ritchie46, @sweb and dependabot[bot]

Python Polars 1.37.1

12 Jan 23:27
bb79993

Choose a tag to compare

🚀 Performance improvements

  • Speed up SQL interface "UNION" clauses (#26039)

🐞 Bug fixes

  • Optimize slicing support on compressed IPC (#26071)
  • CPU check for musl builds (#26076)
  • Propagate C Stream import errors instead of panicking (#26036)
  • Fix slicing on compressed IPC (#26066)

📖 Documentation

  • Clarify min_by/max_by behavior on ties (#26077)

🛠️ Other improvements

  • Mark top slow normal tests as slow (#26080)
  • Update breaking deps (#26055)
  • Fix for upstream url bug and update deps (#26052)
  • Properly pin chrono (#26051)
  • Don't run rust doctests (#26046)
  • Update deps (#26042)
  • Ignore very slow test (#26041)

Thank you to all our contributors for making this release possible!
@Voultapher, @alexander-beedie, @kdn36, @nameexhaustion, @orlp, @ritchie46 and @wtn