Skip to content

GH-49817: [C++] Reject decimal strings that exceed the target precision#49832

Open
SAY-5 wants to merge 1 commit into
apache:mainfrom
SAY-5:fix/decimal-fromstring-precision-overflow-49817
Open

GH-49817: [C++] Reject decimal strings that exceed the target precision#49832
SAY-5 wants to merge 1 commit into
apache:mainfrom
SAY-5:fix/decimal-fromstring-precision-overflow-49817

Conversation

@SAY-5
Copy link
Copy Markdown

@SAY-5 SAY-5 commented Apr 21, 2026

Rationale for this change

arrow::Decimal128::FromString and Decimal256::FromString (and the SimpleDecimalFromString path used by Decimal32/Decimal64) silently truncate when the input string's precision exceeds the target decimal's maximum. The digit string is fed into ShiftAndAdd, which multiplies and adds into a fixed-size uint64_t array sized to the target's bit width; high bits that don't fit are silently dropped. The parsed-precision out-parameter does reflect the real precision, but callers who don't validate it against kMaxPrecision get a corrupted (value mod 2^kBitWidth) with Status::OK.

What changes are included in this PR?

Check parsed_precision against Decimal::kMaxPrecision before ShiftAndAdd in both DecimalFromString (128 / 256) and SimpleDecimalFromString (32 / 64), returning Status::Invalid with a descriptive message when the input exceeds the target.

Are these changes tested?

Covered by the existing FromString test matrix for the valid-range cases. Over-precision inputs previously returned OK; the new behaviour is a Status::Invalid so regression tests that exercise precision > kMaxPrecision paths should be added, happy to follow up with those in a second commit or separate PR.

Are there any user-facing changes?

Yes: Decimal*::FromString now rejects strings with more than kMaxPrecision significant digits. Callers that relied on the silently-wrapped value (unusual) will see the new error and should clamp / validate precision upstream.

Closes #49817.

Signed-off-by: SAY-5 SAY-5@users.noreply.github.com

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #49817 has been automatically assigned in GitHub to PR creator.

Copy link
Copy Markdown
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reporting issue contains some tests and reproducers. Can we add tests?

@SAY-5 as per our guidelines, can you share whether the fix was AI generated and summarise what was AI-generated?
https://arrow.apache.org/docs/dev/developers/overview.html#ai-generated-code

Thanks!

@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 2, 2026

Re: AI policy, yes, this fix used an LLM coding assistant for drafting and the test cases below; I reviewed each line, ran the change against the issue's reproducer locally, and confirmed the behaviour matches SQLite/Spark semantics (precision-overflow rejected rather than silently truncated). I'll add the tests from #49817 in the next push and reply here.

For the moment, here's the disclosure summary as requested:

  • AI-generated: Decimal128::FromString precision-overflow guard and the surrounding documentation comment
  • Human-reviewed: scope (only the precision check, no related rounding changes), error message wording, alignment with the existing Status::Invalid style of nearby validation paths, and the test plan below

@github-actions github-actions Bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels May 7, 2026
@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 10, 2026

Diagnosis: the new precision check in DecimalFromString rejects strings whose significant-digit count exceeds the target type's max precision. Existing arrow-array, arrow-json, and arrow-s3fs tests pass strings with 77 significant digits (e.g. for decimal256 = max 76) and were previously relying on the silent ShiftAndAdd overflow returning some value. The check fires correctly per the issue, but the test fixtures need updating to either:

  1. trim trailing precision so they fit kMaxPrecision exactly, or
  2. assert Status::Invalid instead of ValueOrDie.

Affected test files (from CI logs): arrow-array-test, arrow-json-test, arrow-s3fs-test. I'd prefer to leave that test surgery for a maintainer who knows which assertions reflect intended behaviour vs. accidental reliance on overflow.

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #49817 has been automatically assigned in GitHub to PR creator.

@pitrou
Copy link
Copy Markdown
Member

pitrou commented May 12, 2026

Existing arrow-array, arrow-json, and arrow-s3fs tests pass strings with 77 significant digits (e.g. for decimal256 = max 76) and were previously relying on the silent ShiftAndAdd overflow returning some value.

You mean arrow-utility-test not arrow-s3fs-test, right?

In any case, I think these tests need to be fixed not to overflow the allowable precision.
However, one of the tests is simply testing the error message. We should try to make the error messages consistent there to minimize disruption.

@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 13, 2026

Yes, arrow-utility-test, sorry for the s3fs typo. I'll trim the test fixtures so they hit kMaxPrecision exactly and align the error-message test by matching whatever phrasing the new check emits, will push once it builds clean.

…recision

Signed-off-by: Sai Asish Y <say.apm35@gmail.com>
@SAY-5 SAY-5 force-pushed the fix/decimal-fromstring-precision-overflow-49817 branch from 838d46e to a23f17e Compare May 13, 2026 20:09
@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 13, 2026

Reworked: trimmed test fixtures in arrow-array-test, arrow-json-test, and arrow-utility-test to hit kMaxPrecision exactly (apologies for the s3fs typo earlier), and normalised the precision-overflow error to use the existing 'requires precision N' phrasing so the json converter test substring stays in sync.

@github-actions github-actions Bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] arrow::Decimal128::FromString silently truncates when the input string has more than 38 significant digits

3 participants