Slow regex performance

The regex implementation in Swift is quite slow and this very much affects applications that apply a lot of regex in a very negative way.

There are some other issues open that mention slow performance in some cases, but it might be a bigger problem than those few special cases. So I am adding this issue as a reminder.

I started a [Swift forums topic](https://forums.swift.org/t/slow-regex-performance/75768) on this matter in November 2024 but not much seems to have changed since than. In this topic, I have links to a benchmark implemented in Java and in Swift (the Swift one is now part of the benchmarks of this package), the factor is more than 10x between them. I think a factor of 2 to 3 could be OK given that the Swift version is more capable and more correct, but with the current slowness it was considered for some of my projects to leave the Swift platform completely just for this matter.

But there seem to be ways for improvements as noted by [Michael_Ilseman](https://forums.swift.org/u/michael_ilseman) in the mentioned forums topic:

>...the slow perf comes down to some areas we're currently looking at improving:
>
>- More time is spent on retain/release than actually matching content
>    - Upcoming adoption of ~Copyable and ~Escapable
>- More time is spent in copying our COW arrays than in actual matching
>    - Upcoming rework using custom data structures
>    - Situational: dropping captures that are never read from
>- Lots of time drilling through String's higher-level representation
>    - Upcoming adoption of UTF8Span for direct UTF-8 processing
>
>Also, we'd like to add more optimizations, such as more quickly finding runs of literal content. Unlike the improvements listed above, these might only benefit specific regexes (or the way they're written) over specific kinds of input. Thus, there is a risk that a benchmark might see dramatic improvements that don't translate to the workload the benchmark was derived from.
>
>For example, most of your input data are lines that are successfully matched in their entirety. Other input data could include long lines with content that starts off looking like a URL, but ultimately isn't matched. If you're able to share more regexes or more kinds of input, we can make sure that optimizations are applicable to real workloads.
>
>Finally, we've made some significant performance improvements that are (unfortunately) not yet in a toolchain but will be soon. On your benchmark, I'm seeing a 1.56x improvement from those changes for grapheme-cluster semantics and a 1.53x improvement for scalar semantics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow regex performance #853

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Slow regex performance #853

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions