Skip to content

Remove fastokens entirely#95

Merged
hallerite merged 1 commit into
mainfrom
rip-fastokens
Jun 26, 2026
Merged

Remove fastokens entirely#95
hallerite merged 1 commit into
mainfrom
rip-fastokens

Conversation

@hallerite

@hallerite hallerite commented Jun 26, 2026

Copy link
Copy Markdown
Member

Why

fastokens is a process-global monkey-patch on transformers.AutoTokenizer that swaps .backend_tokenizer for a faster Rust BPE shim. The shim has no offset-mapping support, so we kept a parallel vanilla tokenizer per model purely for body/scaffold attribution in attribute_text_segments. After the v3 collapse-or-fallback refactor (PR #87), most assistant messages render mixed-label (scaffold+content+scaffold) and run through the offset path where fastokens contributes nothing; fastokens only saves time on homogeneous-label runs and direct _encode/_decode helpers.

The complexity tax it was charging:

  • FASTOKENS_INCOMPATIBLE frozenset (DeepSeek-V3 family workaround — only exists because fastokens lacks Metaspace pre-tokenizer support)
  • _FASTOKENS_PATCH_LOCK + _FASTOKENS_ANNOUNCED global state
  • _patched_load() with contextlib.redirect_stdout to swallow [fastokens] patch_transformers: ... prints under thread contention
  • The unpatch/restore race-safe path inside _get_offset_tokenizer (PR fix: _get_offset_tokenizer immune to global fastokens patch (concurrent-pool race) #86)
  • Two tokenizers per model resident in memory
  • Tests/CI gates around every one of the above

…wasn't earning its keep.

What's removed

Surface Lines
fastokens>=0.2.0 dep + [tool.uv.exclude-newer-package].fastokens exemption pyproject.toml
FASTOKENS_INCOMPATIBLE, _FASTOKENS_PATCH_LOCK, _FASTOKENS_ANNOUNCED, _patched_load base.py
use_fastokens kwarg on load_tokenizer (signature → 1 positional arg) base.py
Fastokens-failure-retry-vanilla branch in load_tokenizer base.py
Orphan imports: top-level contextlib, io base.py
_offset_tokenizers cache + lock + entire unpatch-and-reload block base.py
tests/test_load_tokenizer_fastokens.py (211 lines, all fastokens-specific) tests/

What replaces it

_get_offset_tokenizer collapses to its essential contract — probe and return, or raise:

def _get_offset_tokenizer(tokenizer):
    try:
        tokenizer("a", add_special_tokens=False, return_offsets_mapping=True)
    except (NotImplementedError, ValueError, TypeError) as exc:
        raise RuntimeError(
            "Hand-coded renderers require a fast tokenizer with "
            "``return_offsets_mapping=True`` support …"
        ) from exc
    return tokenizer

Tokenizers from load_tokenizer are PreTrainedTokenizerFast and satisfy this trivially. BYO tokenizers without return_offsets_mapping support fail loudly at construction time instead of silently triggering a reload from name_or_path that only existed to paper over the fastokens shim's missing offsets.

The single test that exercised the BYO-no-offsets reload path is replaced with one asserting the new error contract (test_get_offset_tokenizer_rejects_offsetless_byo).

Perf impact (honest)

Encode/decode now go through HuggingFace's stock tokenizers Rust BPE instead of fastokens (claimed ~10x faster on synthetic benchmarks). The mixed-label path — which is most assistant turns post-PR-#87 — was already off fastokens before this change, so the realistic-workload regression is bounded to homogeneous-label runs and direct encode/decode helper calls.

Not measured. If it matters in real training pools, revisit by adding fastokens back as the offset tokenizer once it supports offset_mapping upstream (filed: TODO link if a fastokens issue exists). Until then, the simpler tree wins.

Tests

  • Before (origin/main): 2259 passed, 88 skipped, 1 xfailed
  • After (this PR): 2248 passed, 88 skipped, 1 xfailed
  • Delta: 11 deleted fastokens-specific tests, no regressions

Stats

 pyproject.toml                         |  12 --
 renderers/base.py                      | 243 ++++-----------------------------
 tests/test_load_tokenizer.py           |  46 ++-----
 tests/test_load_tokenizer_fastokens.py | 213 -----------------------------
 uv.lock                                |  36 +----
 5 files changed, 44 insertions(+), 506 deletions(-)

Net −462 lines.

Followups

🤖 Generated with Claude Code


Note

Medium Risk
Behavior change for callers passing use_fastokens or relying on silent offset-capable tokenizer reload; encode performance may regress on homogeneous-label paths. Core rendering security policy is unchanged.

Overview
Removes the fastokens optional dependency and all tokenizer monkey-patching from renderers.base, shrinking the load path to vanilla AutoTokenizer / PreTrainedTokenizerFast only.

load_tokenizer no longer accepts use_fastokens; it always goes through _load_tokenizer_via_auto with the existing security policy (pinned Kimi revisions, Llama unsloth mirrors). Deleted surfaces include FASTOKENS_INCOMPATIBLE, _patched_load, patch locks, stdout suppression, and fastokens failure fallback.

_get_offset_tokenizer is simplified to a probe-and-return contract: the supplied tokenizer must support return_offsets_mapping=True, or a RuntimeError is raised. The per-model vanilla offset cache and fastokens race-safe reload path are gone—callers with BYO tokenizers must pass a fast tokenizer explicitly.

Tests and lockfile drop tests/test_load_tokenizer_fastokens.py, update test_load_tokenizer.py (including test_get_offset_tokenizer_rejects_offsetless_byo), and remove fastokens from pyproject.toml and uv.lock.

Trade-off: encode/decode use stock HuggingFace tokenizers instead of the fastokens shim (~10x encode claim); mixed-label / offset attribution paths were already on vanilla HF for offsets.

Reviewed by Cursor Bugbot for commit 33c3f39. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Remove fastokens dependency and all related tokenizer patching logic

  • Removes the fastokens package from dependencies and deletes all integration code, including _patched_load, the use_fastokens parameter on load_tokenizer, and the _offset_tokenizers cache.
  • load_tokenizer now always loads a vanilla fast tokenizer via AutoTokenizer with no retry or fallback logic.
  • _get_offset_tokenizer now raises RuntimeError instead of transparently loading a separate offset-capable tokenizer when the supplied tokenizer lacks offset mapping support.
  • Deletes tests/test_load_tokenizer_fastokens.py and updates tests/test_load_tokenizer.py to match the new behavior.
  • Risk: callers relying on the automatic offset-tokenizer fallback or the use_fastokens parameter will get a TypeError or RuntimeError at runtime.

Macroscope summarized 33c3f39.

@macroscopeapp

macroscopeapp Bot commented Jun 26, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

This PR removes the fastokens dependency which provided ~10x tokenization speedup. While the code simplification is clean, removing a significant performance optimization is an architectural decision that warrants human review to confirm the tradeoff is intentional.

You can customize Macroscope's approvability policy. Learn more.

fastokens is a process-global monkey-patch on transformers.AutoTokenizer
that swaps the .backend_tokenizer for a faster Rust BPE shim. The shim
has no offset_mapping support, so we kept a parallel vanilla tokenizer
per model purely for body/scaffold attribution in
attribute_text_segments. Most assistant messages render
mixed-label (scaffold+content+scaffold), which means the offset path
runs and fastokens doesn't help; fastokens only saves time on
homogeneous-label runs and direct _encode/_decode calls. The
complexity tax — FASTOKENS_INCOMPATIBLE denylist (DeepSeek-V3),
_FASTOKENS_PATCH_LOCK for pool slot races, contextlib.redirect_stdout
to swallow [fastokens] prints, unpatch/restore dance in
_get_offset_tokenizer, twin tokenizer in memory — was not earning its
keep.

What's removed:
* fastokens dependency in pyproject.toml + its uv.exclude-newer-package
  exemption
* FASTOKENS_INCOMPATIBLE frozenset (DeepSeek-V3 family workaround,
  which only existed because fastokens lacks Metaspace pre-tokenizer
  support)
* _FASTOKENS_PATCH_LOCK, _FASTOKENS_ANNOUNCED, _patched_load
* use_fastokens kwarg on load_tokenizer (signature simplifies to one
  positional arg)
* The fastokens-failure retry-vanilla branch in load_tokenizer
* The whole _offset_tokenizers cache, its lock, and the
  unpatch-and-reload race-safe path in _get_offset_tokenizer
* tests/test_load_tokenizer_fastokens.py (211 lines, all asserted
  fastokens-specific behaviour)

What replaces it:

_get_offset_tokenizer is now 3 lines of real logic: probe → return, or
raise a clear error. The contract is "pass a fast tokenizer or get a
loud error." Tokenizers from load_tokenizer are
PreTrainedTokenizerFast and satisfy this trivially. BYO tokenizers
without return_offsets_mapping support fail at construction time
instead of silently triggering a reload-from-name_or_path that only
existed to paper over the fastokens shim's missing offsets.

The single test that exercised the BYO-no-offsets reload path is
replaced with one asserting the new error contract.

Perf impact:
encode/decode now goes through HuggingFace's stock tokenizers Rust BPE
instead of fastokens' (claimed ~10x faster on synthetic benchmarks).
The mixed-label path (most assistant turns) was already off fastokens
before this change, so the realistic-workload regression is bounded to
homogeneous-label runs and direct encode/decode helper calls.
Measure if it matters; revisit by adding fastokens back as the
offset tokenizer once it supports offset_mapping upstream.

Net: -462 lines. Test suite: 2248 passed, 88 skipped, 1 xfailed
(2259 → 2248 = 11 fastokens-specific tests deleted; no regressions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hallerite hallerite merged commit 082836b into main Jun 26, 2026
10 of 11 checks passed
@hallerite

Copy link
Copy Markdown
Member Author

this is mostly because we currently need offset renderers everywhere. we will switch back to fasttokens once they support it.

@hallerite hallerite deleted the rip-fastokens branch June 26, 2026 03:25
hallerite added a commit that referenced this pull request Jun 26, 2026
Two related refactors of the emit_text_segments / attribute_text_segments
pipeline:

1. ``emit_text_segments`` closures across 8 hand-coded renderers
   (qwen3, qwen35, glm45, glm5, deepseek_v3, nemotron3, laguna_xs2,
   minimax_m2) get a "collapse-or-fallback" pattern: adjacent
   same-label segments are folded into one ``emit_text`` call
   (preserves internal BPE merges, skips the offset path); only
   genuinely mixed-label runs go through ``attribute_text_segments``.
   Most rendering paths end up homogeneous after collapse, so the
   offset machinery only runs when it actually has to.

2. ``attribute_text_segments`` is rewritten to use the Rust
   ``tokenizers.Encoding`` API directly — ``.encode().ids`` /
   ``.encode().offsets`` — instead of going through
   ``transformers``'s ``return_offsets_mapping=True`` dict API. This
   unblocks the future ``transformers``-optional path (issue #31): a
   BYO ``tokenizers.Tokenizer`` works without any ``transformers``
   wrapper. ``_get_offset_tokenizer`` becomes a 2-path resolver
   (direct Rust tokenizer, or extract ``.backend_tokenizer`` from a
   ``PreTrainedTokenizerFast``); no second tokenizer load, no
   probe-verify, no AutoTokenizer fallback — all of those existed in
   the previous version of this PR to coordinate with the
   fastokens shim, which is gone after #95.

``minimax_m2.emit_token_overlap_body`` and ``qwen3_vl._Emitter._flush``
are updated to call the new ``Encoding``-based offset API directly.

``tokenizers>=0.20`` becomes an explicit core dependency — it was
already a transitive of ``transformers``, but the new ``attribute_text_segments``
imports from ``tokenizers`` at the module level so we declare it.

Tests: 2248 passed, 88 skipped, 1 xfailed (baseline parity with #95).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hallerite added a commit that referenced this pull request Jun 26, 2026
Two related refactors of the emit_text_segments / attribute_text_segments
pipeline:

1. ``emit_text_segments`` closures across 8 hand-coded renderers
   (qwen3, qwen35, glm45, glm5, deepseek_v3, nemotron3, laguna_xs2,
   minimax_m2) get a "collapse-or-fallback" pattern: adjacent
   same-label segments are folded into one ``emit_text`` call
   (preserves internal BPE merges, skips the offset path); only
   genuinely mixed-label runs go through ``attribute_text_segments``.
   Most rendering paths end up homogeneous after collapse, so the
   offset machinery only runs when it actually has to.

2. ``attribute_text_segments`` is rewritten to use the Rust
   ``tokenizers.Encoding`` API directly — ``.encode().ids`` /
   ``.encode().offsets`` — instead of going through
   ``transformers``'s ``return_offsets_mapping=True`` dict API. This
   unblocks the future ``transformers``-optional path (issue #31): a
   BYO ``tokenizers.Tokenizer`` works without any ``transformers``
   wrapper. ``_get_offset_tokenizer`` becomes a 2-path resolver
   (direct Rust tokenizer, or extract ``.backend_tokenizer`` from a
   ``PreTrainedTokenizerFast``); no second tokenizer load, no
   probe-verify, no AutoTokenizer fallback — all of those existed in
   the previous version of this PR to coordinate with the
   fastokens shim, which is gone after #95.

``minimax_m2.emit_token_overlap_body`` and ``qwen3_vl._Emitter._flush``
are updated to call the new ``Encoding``-based offset API directly.

``tokenizers>=0.20`` becomes an explicit core dependency — it was
already a transitive of ``transformers``, but the new ``attribute_text_segments``
imports from ``tokenizers`` at the module level so we declare it.

Tests: 2248 passed, 88 skipped, 1 xfailed (baseline parity with #95).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joanvelja added a commit to joanvelja/renderers that referenced this pull request Jun 27, 2026
…ens removal PrimeIntellect-ai#95, dyn-versioning PrimeIntellect-ai#96) + migrate gemma4 (#10)

* adopt verifiers-style dynamic versioning (PrimeIntellect-ai#96)

* remove fastokens entirely for now (PrimeIntellect-ai#95)

* feat(thinking): replace preserve_* bools with thinking_retention, respected by the bridge (PrimeIntellect-ai#88)

* migrate gemma4 to thinking_retention (render+bridge, implied=all; debate hot path 1:1, full-render keeps thinking) + fastokens test fix

---------

Co-authored-by: hallerite <git@hallerite.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants