Remove fastokens entirely by hallerite · Pull Request #95 · PrimeIntellect-ai/renderers

hallerite · 2026-06-26T02:41:31Z

Why

fastokens is a process-global monkey-patch on transformers.AutoTokenizer that swaps .backend_tokenizer for a faster Rust BPE shim. The shim has no offset-mapping support, so we kept a parallel vanilla tokenizer per model purely for body/scaffold attribution in attribute_text_segments. After the v3 collapse-or-fallback refactor (PR #87), most assistant messages render mixed-label (scaffold+content+scaffold) and run through the offset path where fastokens contributes nothing; fastokens only saves time on homogeneous-label runs and direct _encode/_decode helpers.

The complexity tax it was charging:

FASTOKENS_INCOMPATIBLE frozenset (DeepSeek-V3 family workaround — only exists because fastokens lacks Metaspace pre-tokenizer support)
_FASTOKENS_PATCH_LOCK + _FASTOKENS_ANNOUNCED global state
_patched_load() with contextlib.redirect_stdout to swallow [fastokens] patch_transformers: ... prints under thread contention
The unpatch/restore race-safe path inside _get_offset_tokenizer (PR fix: _get_offset_tokenizer immune to global fastokens patch (concurrent-pool race) #86)
Two tokenizers per model resident in memory
Tests/CI gates around every one of the above

…wasn't earning its keep.

What's removed

Surface	Lines
`fastokens>=0.2.0` dep + `[tool.uv.exclude-newer-package].fastokens` exemption	pyproject.toml
`FASTOKENS_INCOMPATIBLE`, `_FASTOKENS_PATCH_LOCK`, `_FASTOKENS_ANNOUNCED`, `_patched_load`	base.py
`use_fastokens` kwarg on `load_tokenizer` (signature → 1 positional arg)	base.py
Fastokens-failure-retry-vanilla branch in `load_tokenizer`	base.py
Orphan imports: top-level `contextlib`, `io`	base.py
`_offset_tokenizers` cache + lock + entire unpatch-and-reload block	base.py
`tests/test_load_tokenizer_fastokens.py` (211 lines, all fastokens-specific)	tests/

What replaces it

_get_offset_tokenizer collapses to its essential contract — probe and return, or raise:

def _get_offset_tokenizer(tokenizer):
    try:
        tokenizer("a", add_special_tokens=False, return_offsets_mapping=True)
    except (NotImplementedError, ValueError, TypeError) as exc:
        raise RuntimeError(
            "Hand-coded renderers require a fast tokenizer with "
            "``return_offsets_mapping=True`` support …"
        ) from exc
    return tokenizer

Tokenizers from load_tokenizer are PreTrainedTokenizerFast and satisfy this trivially. BYO tokenizers without return_offsets_mapping support fail loudly at construction time instead of silently triggering a reload from name_or_path that only existed to paper over the fastokens shim's missing offsets.

The single test that exercised the BYO-no-offsets reload path is replaced with one asserting the new error contract (test_get_offset_tokenizer_rejects_offsetless_byo).

Perf impact (honest)

Encode/decode now go through HuggingFace's stock tokenizers Rust BPE instead of fastokens (claimed ~10x faster on synthetic benchmarks). The mixed-label path — which is most assistant turns post-PR-#87 — was already off fastokens before this change, so the realistic-workload regression is bounded to homogeneous-label runs and direct encode/decode helper calls.

Not measured. If it matters in real training pools, revisit by adding fastokens back as the offset tokenizer once it supports offset_mapping upstream (filed: TODO link if a fastokens issue exists). Until then, the simpler tree wins.

Tests

Before (origin/main): 2259 passed, 88 skipped, 1 xfailed
After (this PR): 2248 passed, 88 skipped, 1 xfailed
Delta: 11 deleted fastokens-specific tests, no regressions

Stats

 pyproject.toml                         |  12 --
 renderers/base.py                      | 243 ++++-----------------------------
 tests/test_load_tokenizer.py           |  46 ++-----
 tests/test_load_tokenizer_fastokens.py | 213 -----------------------------
 uv.lock                                |  36 +----
 5 files changed, 44 insertions(+), 506 deletions(-)

Net −462 lines.

Followups

PR v3 collapse closures + tokenizers.Encoding offset API #87 (drop transformers from offset path) needs a rebase onto this; most of its _get_offset_tokenizer rewrite becomes simpler/obsolete since the path-4 AutoTokenizer fallback existed for fastokens-induced reasons.
"Make transformers an optional dependency" (issue Is transformers necessary or tokenizers is enough? #31, plan at .claude/plans/enchanted-sniffing-scott.md) becomes simpler — load_tokenizer is now the only transformers user on the construction path.

🤖 Generated with Claude Code

Note

Medium Risk
Behavior change for callers passing use_fastokens or relying on silent offset-capable tokenizer reload; encode performance may regress on homogeneous-label paths. Core rendering security policy is unchanged.

Overview
Removes the fastokens optional dependency and all tokenizer monkey-patching from renderers.base, shrinking the load path to vanilla AutoTokenizer / PreTrainedTokenizerFast only.

load_tokenizer no longer accepts use_fastokens; it always goes through _load_tokenizer_via_auto with the existing security policy (pinned Kimi revisions, Llama unsloth mirrors). Deleted surfaces include FASTOKENS_INCOMPATIBLE, _patched_load, patch locks, stdout suppression, and fastokens failure fallback.

_get_offset_tokenizer is simplified to a probe-and-return contract: the supplied tokenizer must support return_offsets_mapping=True, or a RuntimeError is raised. The per-model vanilla offset cache and fastokens race-safe reload path are gone—callers with BYO tokenizers must pass a fast tokenizer explicitly.

Tests and lockfile drop tests/test_load_tokenizer_fastokens.py, update test_load_tokenizer.py (including test_get_offset_tokenizer_rejects_offsetless_byo), and remove fastokens from pyproject.toml and uv.lock.

Trade-off: encode/decode use stock HuggingFace tokenizers instead of the fastokens shim (~10x encode claim); mixed-label / offset attribution paths were already on vanilla HF for offsets.

^{Reviewed by Cursor Bugbot for commit 33c3f39. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Remove `fastokens` dependency and all related tokenizer patching logic

Removes the fastokens package from dependencies and deletes all integration code, including _patched_load, the use_fastokens parameter on load_tokenizer, and the _offset_tokenizers cache.
load_tokenizer now always loads a vanilla fast tokenizer via AutoTokenizer with no retry or fallback logic.
_get_offset_tokenizer now raises RuntimeError instead of transparently loading a separate offset-capable tokenizer when the supplied tokenizer lacks offset mapping support.
Deletes tests/test_load_tokenizer_fastokens.py and updates tests/test_load_tokenizer.py to match the new behavior.
Risk: callers relying on the automatic offset-tokenizer fallback or the use_fastokens parameter will get a TypeError or RuntimeError at runtime.

^{Macroscope summarized 33c3f39.}

macroscopeapp · 2026-06-26T02:42:40Z

Approvability

Verdict: Needs human review

This PR removes the fastokens dependency which provided ~10x tokenization speedup. While the code simplification is clean, removing a significant performance optimization is an architectural decision that warrants human review to confirm the tradeoff is intentional.

^{You can customize Macroscope's approvability policy. Learn more.}

fastokens is a process-global monkey-patch on transformers.AutoTokenizer that swaps the .backend_tokenizer for a faster Rust BPE shim. The shim has no offset_mapping support, so we kept a parallel vanilla tokenizer per model purely for body/scaffold attribution in attribute_text_segments. Most assistant messages render mixed-label (scaffold+content+scaffold), which means the offset path runs and fastokens doesn't help; fastokens only saves time on homogeneous-label runs and direct _encode/_decode calls. The complexity tax — FASTOKENS_INCOMPATIBLE denylist (DeepSeek-V3), _FASTOKENS_PATCH_LOCK for pool slot races, contextlib.redirect_stdout to swallow [fastokens] prints, unpatch/restore dance in _get_offset_tokenizer, twin tokenizer in memory — was not earning its keep. What's removed: * fastokens dependency in pyproject.toml + its uv.exclude-newer-package exemption * FASTOKENS_INCOMPATIBLE frozenset (DeepSeek-V3 family workaround, which only existed because fastokens lacks Metaspace pre-tokenizer support) * _FASTOKENS_PATCH_LOCK, _FASTOKENS_ANNOUNCED, _patched_load * use_fastokens kwarg on load_tokenizer (signature simplifies to one positional arg) * The fastokens-failure retry-vanilla branch in load_tokenizer * The whole _offset_tokenizers cache, its lock, and the unpatch-and-reload race-safe path in _get_offset_tokenizer * tests/test_load_tokenizer_fastokens.py (211 lines, all asserted fastokens-specific behaviour) What replaces it: _get_offset_tokenizer is now 3 lines of real logic: probe → return, or raise a clear error. The contract is "pass a fast tokenizer or get a loud error." Tokenizers from load_tokenizer are PreTrainedTokenizerFast and satisfy this trivially. BYO tokenizers without return_offsets_mapping support fail at construction time instead of silently triggering a reload-from-name_or_path that only existed to paper over the fastokens shim's missing offsets. The single test that exercised the BYO-no-offsets reload path is replaced with one asserting the new error contract. Perf impact: encode/decode now goes through HuggingFace's stock tokenizers Rust BPE instead of fastokens' (claimed ~10x faster on synthetic benchmarks). The mixed-label path (most assistant turns) was already off fastokens before this change, so the realistic-workload regression is bounded to homogeneous-label runs and direct encode/decode helper calls. Measure if it matters; revisit by adding fastokens back as the offset tokenizer once it supports offset_mapping upstream. Net: -462 lines. Test suite: 2248 passed, 88 skipped, 1 xfailed (2259 → 2248 = 11 fastokens-specific tests deleted; no regressions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hallerite · 2026-06-26T03:25:36Z

this is mostly because we currently need offset renderers everywhere. we will switch back to fasttokens once they support it.

Two related refactors of the emit_text_segments / attribute_text_segments pipeline: 1. ``emit_text_segments`` closures across 8 hand-coded renderers (qwen3, qwen35, glm45, glm5, deepseek_v3, nemotron3, laguna_xs2, minimax_m2) get a "collapse-or-fallback" pattern: adjacent same-label segments are folded into one ``emit_text`` call (preserves internal BPE merges, skips the offset path); only genuinely mixed-label runs go through ``attribute_text_segments``. Most rendering paths end up homogeneous after collapse, so the offset machinery only runs when it actually has to. 2. ``attribute_text_segments`` is rewritten to use the Rust ``tokenizers.Encoding`` API directly — ``.encode().ids`` / ``.encode().offsets`` — instead of going through ``transformers``'s ``return_offsets_mapping=True`` dict API. This unblocks the future ``transformers``-optional path (issue #31): a BYO ``tokenizers.Tokenizer`` works without any ``transformers`` wrapper. ``_get_offset_tokenizer`` becomes a 2-path resolver (direct Rust tokenizer, or extract ``.backend_tokenizer`` from a ``PreTrainedTokenizerFast``); no second tokenizer load, no probe-verify, no AutoTokenizer fallback — all of those existed in the previous version of this PR to coordinate with the fastokens shim, which is gone after #95. ``minimax_m2.emit_token_overlap_body`` and ``qwen3_vl._Emitter._flush`` are updated to call the new ``Encoding``-based offset API directly. ``tokenizers>=0.20`` becomes an explicit core dependency — it was already a transitive of ``transformers``, but the new ``attribute_text_segments`` imports from ``tokenizers`` at the module level so we declare it. Tests: 2248 passed, 88 skipped, 1 xfailed (baseline parity with #95). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ens removal PrimeIntellect-ai#95, dyn-versioning PrimeIntellect-ai#96) + migrate gemma4 (#10) * adopt verifiers-style dynamic versioning (PrimeIntellect-ai#96) * remove fastokens entirely for now (PrimeIntellect-ai#95) * feat(thinking): replace preserve_* bools with thinking_retention, respected by the bridge (PrimeIntellect-ai#88) * migrate gemma4 to thinking_retention (render+bridge, implied=all; debate hot path 1:1, full-render keeps thinking) + fastokens test fix --------- Co-authored-by: hallerite <git@hallerite.com>

hallerite force-pushed the rip-fastokens branch from 3944a5a to 60ed74c Compare June 26, 2026 03:06

hallerite force-pushed the rip-fastokens branch from 60ed74c to 33c3f39 Compare June 26, 2026 03:22

samsja approved these changes Jun 26, 2026

View reviewed changes

hallerite merged commit 082836b into main Jun 26, 2026
10 of 11 checks passed

hallerite deleted the rip-fastokens branch June 26, 2026 03:25

hallerite mentioned this pull request Jun 26, 2026

v3 collapse closures + tokenizers.Encoding offset API #87

Open

joanvelja mentioned this pull request Jun 27, 2026

chore: sync upstream (thinking_retention #88, fastokens removal #95, dyn-versioning #96) + migrate gemma4 joanvelja/renderers#10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove fastokens entirely#95

Remove fastokens entirely#95
hallerite merged 1 commit into
mainfrom
rip-fastokens

hallerite commented Jun 26, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

macroscopeapp Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

hallerite commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hallerite commented Jun 26, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What's removed

What replaces it

Perf impact (honest)

Tests

Stats

Followups

Remove fastokens dependency and all related tokenizer patching logic

Uh oh!

macroscopeapp Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

Uh oh!

hallerite commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hallerite commented Jun 26, 2026 •

edited by macroscopeapp Bot

Loading

Remove `fastokens` dependency and all related tokenizer patching logic

macroscopeapp Bot commented Jun 26, 2026 •

edited

Loading