Skip to content

Use CU2BYTES for byte sizing in two allocation sites#909

Open
iliaal wants to merge 1 commit into
PCRE2Project:mainfrom
iliaal:fix/cu2bytes-allocation-sizing
Open

Use CU2BYTES for byte sizing in two allocation sites#909
iliaal wants to merge 1 commit into
PCRE2Project:mainfrom
iliaal:fix/cu2bytes-allocation-sizing

Conversation

@iliaal
Copy link
Copy Markdown

@iliaal iliaal commented May 12, 2026

Two allocation sites multiply by PCRE2_CODE_UNIT_WIDTH (the bit width: 8, 16, or 32) where the byte-count helper CU2BYTES(x) = (x) * (PCRE2_CODE_UNIT_WIDTH/8) is intended. The result over-allocates by the code-unit byte width: 8x in 8-bit mode, 16x in 16-bit, 32x in 32-bit.

Subsequent memcpy calls already use CU2BYTES correctly, so no out-of-bounds write occurs. The over-allocation leaks the difference until the buffer is freed, on every call.

Sites:

  • src/pcre2_substring.c:214 (pcre2_substring_get_bynumber)
  • src/pcre2_convert.c:1219 (pcre2_pattern_convert)

Repo-wide grep for * PCRE2_CODE_UNIT_WIDTH outside preprocessor conditionals turned up only these two sites. Found via internal audit of PCRE2 10.47.

Verified clean against the existing test suite via cmake build + RunTest (29 tests, all pass). Tests 24 and 25 exercise pcre2_pattern_convert; substring tests in test 1 exercise pcre2_substring_get_bynumber.

@NWilson
Copy link
Copy Markdown
Member

NWilson commented May 14, 2026

Thank you! This looks legitimate.

If you used AI, would you mind sharing what tool (and prompt) you used?

I will confirm it manually myself.

I'm trying to get a bit better on hygiene around integer overflows - so ideally we would improve the code further:

  if (*bufflenptr > ((PCRE2_SIZE_MAX - sizeof(pcre2_memctl)) /
        CU2BYTES(1)) - 1 ||
      (allocated = PRIV(memctl_malloc)(sizeof(pcre2_memctl) +
        CU2BYTES(*bufflenptr + 1),
        (pcre2_memctl *)ccontext)) == NULL)
    {
    *bufflenptr = 0;  /* Error offset */
    return PCRE2_ERROR_NOMEMORY;
    }

(In fact, I clearly need a helper for allocate-with-multiply-check. But that's for another PR.)

Would you be able to add the check to both sites you've improved?

@iliaal
Copy link
Copy Markdown
Author

iliaal commented May 14, 2026

Yeah, I'll take a look.

AI was used to draft PR & validate finding when looking at something else. For reference: https://github.com/iliaal/whetstone/tree/main/plugins/whetstone/skills/ia-code-review
https://github.com/iliaal/whetstone/tree/main/plugins/whetstone/skills/ia-meta-prompting (Adversarial)

Were used across multiple LLM (Claude, Codex, Deepseek v4 Pro) cross checks

Two allocation sites multiplied by PCRE2_CODE_UNIT_WIDTH (the bit width:
8, 16, or 32) where the CU2BYTES(x) byte-count helper is intended. The
result over-allocates by the code-unit byte width: 8x in 8-bit mode, 16x
in 16-bit, 32x in 32-bit. Subsequent memcpy calls already use CU2BYTES
correctly, so no out-of-bounds write occurs; the over-allocation is
leaked until the buffer is freed.

Also guard each site against integer overflow in
sizeof(pcre2_memctl) + CU2BYTES(N + 1) by rejecting N greater than
(PCRE2_SIZE_MAX - sizeof(pcre2_memctl)) / CU2BYTES(1) - 1.
@iliaal iliaal force-pushed the fix/cu2bytes-allocation-sizing branch from 4d9acdd to 9763f95 Compare May 14, 2026 13:34
@iliaal
Copy link
Copy Markdown
Author

iliaal commented May 14, 2026

Added the guard at both sites.

@NWilson
Copy link
Copy Markdown
Member

NWilson commented May 14, 2026

Those are some very interesting skills! I have access to Claude & GPT. I would be interested in doing a more thorough audit of all of PCRE2... at some point. I'll bear in mind that your tool might be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants