fix: replace with empty search string should be a no-op#22497
fix: replace with empty search string should be a no-op#22497Amogh-2404 wants to merge 2 commits into
Conversation
PostgreSQL returns the input unchanged when `replace` is called with an empty `from`. DataFusion was instead inserting `to` before every character and at both ends - so `replace('abc', '', 'x')` returned `xaxbxcx`. Fix the empty-`from` branch in `apply_replace` to write the input verbatim. Update the regression tests in `string_query.slt.part` that were asserting the buggy output.
Closes apache#22253
Signed-off-by: Amogh Ramesh <ramogh2404@gmail.com>
|
LGTM, thanks! |
|
We have an existing PR for this issue: |
Thanks @Jefffrey — I missed #22357 when I picked this up, apologies for the duplicate. Happy to close this in favour of it. |
|
I suppose if #22357 doesn't become active again we can proceed with this PR instead |
Jefffrey
left a comment
There was a problem hiding this comment.
one suggestion, otherwise should be good to go
|
We need to update or remove the empty-replace benchmark as well. |
use builder.append_value(string) for the no-op instead of the append_with closure, per the suggestion from apache#22357. also drop the replace_string_empty_from benches since they were timing the old char-interleaving path this PR removes.
|
done. switched to append_value per #22357, and removed the *_empty_from benches since they only timed the path this PR drops. |
Which issue does this PR close?
replacewith an empty search string should be a no-op #22253Rationale for this change
PostgreSQL returns the input unchanged when
replaceis called with an emptyfrom. DataFusion was instead insertingtobefore every character and at both ends, soreplace('abc', '', 'x')returnedxaxbxcx. This PR brings the behaviour in line with PostgreSQL. Part of the PG-compatibility cleanup tracked in #22247.What changes are included in this PR?
datafusion/functions/src/string/replace.rs: the empty-frombranch inapply_replacenow writes the input verbatim instead of insertingto. Added aLargeUtf8unit test for the new behaviour.datafusion/sqllogictest/test_files/string/string_literal.slt: four new SLT asserts covering theUtf8,Dictionary,Utf8View, andLargeUtf8paths.datafusion/sqllogictest/test_files/string/string_query.slt.part: updated four expected rows that were asserting the old buggy output.Are these changes tested?
Yes. The unit test in
replace.rscovers theLargeUtf8path, and the four new SLT asserts instring_literal.sltcover the remaining Arrow string encodings end-to-end. The full SLT suite passes locally.Are there any user-facing changes?
Yes.
replace(str, '', x)now returnsstrunchanged instead of insertingxbetween every character. This matches PostgreSQL.