Skip to content

Rust port: tree_plus v2 (workspace, golden parity, 55-85x faster)#28

Merged
bionicles merged 8 commits into
mainfrom
rust-port
Jun 9, 2026
Merged

Rust port: tree_plus v2 (workspace, golden parity, 55-85x faster)#28
bionicles merged 8 commits into
mainfrom
rust-port

Conversation

@bionicles

Copy link
Copy Markdown
Owner

What

Rust rewrite of tree_plus for the declared version-1 scope (local filesystem mode), preserving the Python implementation's observable behavior via golden parity tests.

  • crates/tree_plus_core: model, deterministic rich-compatible renderer, natural sort (natsort os_sorted parity), fnmatch ignores + amortized globs, wc-parity counting, component extraction (extract_components, the parse_file rename)
  • Tree-sitter extractors: Rust, Python, JavaScript/TypeScript, C, C++, Go with formatters reproducing legacy output (including catastrophic.c, 191/191 components)
  • Regex (linear-time engine only) for markers, Markdown/RST/txt, .env, requirements, Makefile/Justfile; native parsers for JSON/JSONL/YAML/TOML/CSV/SQLite
  • crates/tree_plus_cli: tree_plus + tprs alias (avoids PATH collision with the Python entry point)
  • CI: Python workflows path-ignore Rust files; new rust.yml (fmt, clippy -D warnings, tests on Linux/macOS/Windows)

Testing

  • 80 Rust tests: unit + golden parity (109 fixture component sets, 13 byte-identical tree renders) + CLI integration + robustness (arbitrary bytes never panic)
  • Python suite unchanged: 148 passed
  • Two intentional output differences, pinned in tests and documented in docs/rust-port-differences.md (tensorflow flags special case dropped; one string-literal regex artifact)

Performance (docs/performance.md)

Workload Rust Python Speedup
this repo, full 28 ms 2.39 s 85x
Gymnasium / diamond-types / ramda 20-40 ms 1.4-2.9 s 55-84x

Notes for review

  • Pushing this branch does NOT trigger the PyPI deploy (gated to push on main); merging will skip Python CD when only Rust paths changed
  • Deferred languages tracked in docs/language-roadmap.md with legacy goldens checked in as future contract

🤖 Generated with Claude Code

bionicles and others added 6 commits June 9, 2026 15:47
Python CI/CD (unix, microsoft) now skips runs when only Rust-port files
change (crates/, Cargo manifests, docs/, Justfile). New rust.yml runs
fmt + clippy -D warnings + the full test suite (including golden parity)
on Linux/macOS/Windows with autocrlf disabled so byte-level goldens hold.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
generate_legacy_goldens.py captures the Python implementation's outputs:
per-fixture components, token/line counts, full tree renders, and
v1-scope tree renders (deferred-language extractors stubbed). These are
the behavioral contract for the Rust port's parity suite.
diff_components.py is the dev-loop diff tool.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
tree_plus_core: model, deterministic rich-compatible renderer, natural
sort (natsort os_sorted parity), fnmatch-compatible ignores, amortized
globs, wc-parity counting, and component extraction:
- tree-sitter extractors for Rust, Python, JS/TS, C/C++ with formatters
  that reproduce the legacy output (golden-parity tested, including
  catastrophic.c) and salvage ERROR regions on invalid syntax
- regex extractors (LazyLock, linear-time engine only) for markers,
  Markdown/RST/txt, .env, requirements, Makefile/Justfile, Angular
- native parsers for JSON/JSONL/YAML/TOML/CSV and SQLite (feature)

tree_plus_cli: clap binary as tree_plus + tprs alias (PATH collision
with the Python entry point), legacy flags, footer parity, real
terminal-width detection.

Parallel traversal with rayon behind a deterministic sort; arbitrary
bytes never panic (robustness suite); criterion benches included.
55-85x faster than the Python implementation end to end.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Plus a Justfile with install (tprs), test, lint, goldens, and bench
recipes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
type struct/interface headers and column-0 func signatures, matching
legacy parse_go exactly (golden-parity tested, including the multiline
method signature fixture). Legacy quirks preserved: `func f() {}` and
generic type headers are skipped, raw tabs kept in signatures.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ritus

CI runs stable clippy, which flags collapsible_match patterns the local
nightly toolchain accepted.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@bionicles

Copy link
Copy Markdown
Owner Author

CI status:

  • Rust (new workflow): green on ubuntu, macos, windows — fmt, clippy -D warnings (stable), 80 tests including golden parity
  • Linux & MacOS (Python): green, all 12 matrix jobs (148 tests)
  • Microsoft (Python): red — pre-existing; this workflow has failed its last 6 runs on main (CRLF/wc/encoding-shaped failures on the Windows runners, e.g. test_units_token_counting_wc expects LF byte counts). Unrelated to this branch.

🤖 Generated with Claude Code

bionicles and others added 2 commits June 9, 2026 18:15
Two distinct overflows surfaced by running tprs on the Linux kernel:

1. Tree-sitter extractor visitors (C/C++, Rust, Python, TypeScript, Go)
   recursed on AST depth. arch/x86/kernel/cpu/microcode/intel-ucode-defs.h
   is a headerless initializer-list fragment that parses as a deeply
   nested ERROR tree and blew rayon's 2 MiB worker stacks. All visitors
   now traverse with explicit heap stacks; emission order is unchanged
   (golden parity suite still passes byte-for-byte). The C/C++ extractor
   threads deferred work (record closers, template field suppression,
   ERROR salvage) through a Work queue to preserve legacy ordering.

2. Rayon work-stealing nests from_folder frames on a worker while it
   waits in join, leaving too little headroom for tree-sitter's C frames
   on big trees (drivers/ segfaulted in ts_subtree_retain). from_seeds
   now runs in a dedicated pool with 16 MiB worker stacks.

Adds a regression test extracting deeply nested inputs for all six
tree-sitter languages on a 512 KiB thread, and records Linux-kernel
benchmarks in docs/performance.md (101k files: concise 0.50 s,
full 12.4 s; linux/kernel subdir ~56x faster than the Python CLI).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…4s / 1.5GB)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@bionicles bionicles merged commit 54fe73b into main Jun 9, 2026
16 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant