Skip to content

fix: replace with empty search string should be a no-op#22497

Open
Amogh-2404 wants to merge 2 commits into
apache:mainfrom
Amogh-2404:fix/issue-22253-replace-empty-search
Open

fix: replace with empty search string should be a no-op#22497
Amogh-2404 wants to merge 2 commits into
apache:mainfrom
Amogh-2404:fix/issue-22253-replace-empty-search

Conversation

@Amogh-2404
Copy link
Copy Markdown

@Amogh-2404 Amogh-2404 commented May 24, 2026

Which issue does this PR close?

Rationale for this change

PostgreSQL returns the input unchanged when replace is called with an empty from. DataFusion was instead inserting to before every character and at both ends, so replace('abc', '', 'x') returned xaxbxcx. This PR brings the behaviour in line with PostgreSQL. Part of the PG-compatibility cleanup tracked in #22247.

What changes are included in this PR?

  • datafusion/functions/src/string/replace.rs: the empty-from branch in apply_replace now writes the input verbatim instead of inserting to. Added a LargeUtf8 unit test for the new behaviour.
  • datafusion/sqllogictest/test_files/string/string_literal.slt: four new SLT asserts covering the Utf8, Dictionary, Utf8View, and LargeUtf8 paths.
  • datafusion/sqllogictest/test_files/string/string_query.slt.part: updated four expected rows that were asserting the old buggy output.

Are these changes tested?

Yes. The unit test in replace.rs covers the LargeUtf8 path, and the four new SLT asserts in string_literal.slt cover the remaining Arrow string encodings end-to-end. The full SLT suite passes locally.

Are there any user-facing changes?

Yes. replace(str, '', x) now returns str unchanged instead of inserting x between every character. This matches PostgreSQL.

PostgreSQL returns the input unchanged when `replace` is called with an empty `from`. DataFusion was instead inserting `to` before every character and at both ends - so `replace('abc', '', 'x')` returned `xaxbxcx`. Fix the empty-`from` branch in `apply_replace` to write the input verbatim. Update the regression tests in `string_query.slt.part` that were asserting the buggy output.

Closes apache#22253

Signed-off-by: Amogh Ramesh <ramogh2404@gmail.com>
@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 24, 2026
@Omega359
Copy link
Copy Markdown
Contributor

LGTM, thanks!

@Jefffrey
Copy link
Copy Markdown
Contributor

We have an existing PR for this issue:

@Amogh-2404
Copy link
Copy Markdown
Author

We have an existing PR for this issue:

Thanks @Jefffrey — I missed #22357 when I picked this up, apologies for the duplicate. Happy to close this in favour of it.
In case it helps unblock #22357's CI: the cargo test failure looks like the empty-from regression rows in string_query.slt.part still asserting the old output (this PR updates those four rows), and the benchmark failure is likely the benches/replace.rs comment @neilconway mentioned. Glad to help either way.

@Jefffrey
Copy link
Copy Markdown
Contributor

I suppose if #22357 doesn't become active again we can proceed with this PR instead

Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one suggestion, otherwise should be good to go

Comment thread datafusion/functions/src/string/replace.rs Outdated
@neilconway
Copy link
Copy Markdown
Contributor

We need to update or remove the empty-replace benchmark as well.

use builder.append_value(string) for the no-op instead of the append_with
closure, per the suggestion from apache#22357. also drop the replace_string_empty_from
benches since they were timing the old char-interleaving path this PR removes.
@Amogh-2404
Copy link
Copy Markdown
Author

done. switched to append_value per #22357, and removed the *_empty_from benches since they only timed the path this PR drops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PostgreSQL compatibility: replace with an empty search string should be a no-op

4 participants