Skip to content

gh-150184: difflib C accelerator#150188

Closed
blhsing wants to merge 3 commits into
python:mainfrom
blhsing:difflib-c-accelerator
Closed

gh-150184: difflib C accelerator#150188
blhsing wants to merge 3 commits into
python:mainfrom
blhsing:difflib-c-accelerator

Conversation

@blhsing
Copy link
Copy Markdown
Contributor

@blhsing blhsing commented May 21, 2026

Move the pure-Python implementation of difflib to Lib/_pydifflib.py and
turn Lib/difflib.py into a thin shim that re-exports its public API.
This mirrors the layout used by decimal/_pydecimal, datetime/_pydatetime,
and pickle/_pickle, where the public module dispatches to a faster C
implementation when available and the pure-Python module is preserved
as a self-contained reference for alternative Python implementations.

No public behaviour change.  ``Match`` is constructed with
``module='difflib'`` so its qualified name matches the public module.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@blhsing blhsing changed the title difflib C accelerator gh-150184: difflib C accelerator May 21, 2026
blhsing and others added 2 commits May 21, 2026 18:14
Introduce Modules/_difflibmodule.c, a heap-type C extension that
implements __init__, set_seqs/set_seq1/set_seq2, find_longest_match,
get_matching_blocks, get_opcodes, and ratio for SequenceMatcher.  The
inner DP loop and the full Ratcliff-Obershelp recursion run on int32
label arrays with zero Python C-API calls in the hot path; codepoint-
keyed lookup tables short-circuit per-element dict probes for str and
bytes inputs.  Output is bit-identical to the pure-Python implementation
including tie-breaks.

Lib/difflib.py grows a small subclass that inherits the slow-path
methods (quick_ratio, real_quick_ratio, get_grouped_opcodes) from the
pure-Python class; this is a no-op when the accelerator is not built.

Build wiring: configure.ac registers the module via PY_STDLIB_MOD_SIMPLE
and Modules/Setup.stdlib.in references _difflibmodule.c.  configure must
be regenerated with autoreconf before this lands.

Typical workloads run 5-25x faster than pure Python; the bytes path up
to ~70x.  See Lib/test/test_difflib.py for cross-implementation tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the _difflib C accelerator is built, programmatically generate a
parallel ``*_PurePython`` TestCase for each existing test class so the
same suite covers both implementations.  Pure-Python coverage is
obtained by patching ``difflib.SequenceMatcher`` to
``_pydifflib.SequenceMatcher`` in setUp / restoring it in tearDown;
internal helpers like ``unified_diff`` and ``ndiff`` resolve
``SequenceMatcher`` on ``difflib`` at call time, so patching the module
attribute covers the whole pipeline.

This mirrors the dual-implementation test pattern used by test_decimal
(C* / Py* class pairs) without requiring every existing test method to
be parameterised.

``test_html_diff`` also gets a single-line fix: it depended on
``HtmlDiff._default_prefix`` starting at 0, which only held because it
ran first.  Resetting the counter at the top of the test makes it
order-independent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@blhsing blhsing force-pushed the difflib-c-accelerator branch from e2f103f to 3de5c11 Compare May 21, 2026 10:14
@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented May 21, 2026

Documentation build overview

📚 cpython-previews | 🛠️ Build #32792049 | 📁 Comparing 3de5c11 against main (c35b0f2)

  🔍 Preview build  

2 files changed
± library/difflib.html
± whatsnew/changelog.html

@picnixz picnixz closed this May 21, 2026
@picnixz
Copy link
Copy Markdown
Member

picnixz commented May 21, 2026

Sorry but this needs a discussion. Open a PR on your fork if you want to show a PoC

@picnixz
Copy link
Copy Markdown
Member

picnixz commented May 21, 2026

In addition, after a quick glance, there seem to be many paths that fail to check NULL after calling the C API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants