Skip to content

Adding positive lookahead prevents some optimizations #851

@Fefer-Ivan

Description

@Fefer-Ivan

Consider the following regular expression .*abcd.
And a large string that doesn't match it.
Despite having no match, pcre2grep returns instantly:

> wc -c large_string.txt
   98305 large_string.txt
> time pcre2grep -i '.*abcd' large_string.txt
pcre2grep -i '.*abcd' large_string.txt  0.00s user 0.01s system 45% cpu 0.023 total

Now let's wrap the regex into a positive lookahead (?=.*abcd).
With the same file it takes more than 3 seconds to figure out that there is no match:

> time pcre2grep -i '(?=.*abcd)' large_string.txt
pcre2grep -i '(?=.*abcd)' large_string.txt  3.06s user 0.01s system 99% cpu 3.084 total

And the time complexity increases non-linearly.
If I increase the size of input in 2, time grows 4 times, indicating quadratic match complexity:

> wc -c large_string.txt
  196609 large_string.txt
> time pcre2grep -i '(?=.*abcd)' large_string.txt
pcre2grep -i '(?=.*abcd)' large_string.txt  12.86s user 0.03s system 99% cpu 12.906 total

It looks like wrapping regex into a positive lookahead prevents an optimization of first looking for a fixed abcd substring before matching.

pcre2grep version:

> pcre2grep --version
pcre2grep version 10.47 2025-10-21

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions