Skip to content

feat: Java, Kotlin, and Swift component extraction#29

Open
bionicles wants to merge 1 commit into
mainfrom
rust-java-kotlin-swift
Open

feat: Java, Kotlin, and Swift component extraction#29
bionicles wants to merge 1 commit into
mainfrom
rust-java-kotlin-swift

Conversation

@bionicles

Copy link
Copy Markdown
Owner

Adds the "mobile gang" languages to the Rust port (item 1 of the post-port roadmap; closes part of #7's language list).

What

  • Java (treesitter/java.rs): tree-sitter formatter. Reproduces the legacy combined-regex semantics: annotations (@Name lines, legacy charset truncation), classes ((public )?(abstract )?class ... extends ... implements ...), interface NAME, methods/constructors gated on a literal ` {\n` after the signature, and the two-word bodyless-prototype rule (public String f(); never matched — preserved).
  • Swift (treesitter/swift.rs): tree-sitter formatter. Column-0 type headers (class/struct/protocol/enum keyword-first, so public class is skipped — legacy rule), modifier-free func/init signatures with optional -> T, raw multi-line parameter formatting preserved.
  • Kotlin (extract/kotlin.rs): procedural line scanner, NOT tree-sitter. tree-sitter-kotlin-ng cannot recover from the deliberately invalid constructs in the acceptance fixture (ages: Array<Int>(42),) — one ERROR node swallows the rest of the file — while the legacy pattern is line-anchored and keeps going. The scanner reproduces type-header capture with brace/blank-line/column-0 stops, last-fun -first matching, the -> Word) tail that keeps function-type parameters intact, enum-entry fusion (PLUS { + following fun), and the strip-comments-then-rstrip ordering.

Acceptance

  • JavaTest.java, KotlinTest.kt, swift_test.swift component sets match the legacy goldens byte-for-byte (now enforced by golden_parity.rs.java/.kt/.swift added to v1 scope).
  • trees_v1 goldens regenerated with the three parsers unstubbed; all 13 tree-render parity tests pass.
  • Deep-nesting robustness test extended to all three (512 KiB thread, no recursion on AST depth).
  • 92 tests, cargo +stable clippy -D warnings clean, fmt clean.

Notes

  • Workspace gains tree-sitter-java 0.23 and tree-sitter-swift 0.7; no Kotlin grammar dependency.
  • Python CI is path-skipped for this change set except the golden generator + goldens (tests/golden/**), which the Python suite does not import.

🤖 Generated with Claude Code

Java and Swift are tree-sitter formatters (treesitter/java.rs,
treesitter/swift.rs) emitting source-text slices validated by the legacy
mini-patterns: Java's annotation/class/method/two-word-prototype rules
with the ' {\n' body gate, Swift's column-0 type headers and
modifier-free func/init signatures.

Kotlin is a procedural line scanner (extract/kotlin.rs): the community
grammar (tree-sitter-kotlin-ng) cannot recover from the deliberately
invalid constructs in the acceptance fixture — one ERROR node swallows
the rest of the file — while the legacy pattern is line-anchored and
keeps going. The scanner reproduces the legacy semantics: type headers
captured to a brace/blank-line/column-0 stop, last-'fun '-first matching
with the ' -> Word)' tail that preserves function-type parameters, enum
entry lines fused onto the following fun, and the strip-comments-then-
rstrip ordering whose interplay decides whether a closing paren survives
a trailing line comment.

All three fixture component sets match the legacy goldens byte-for-byte;
trees_v1 goldens regenerated with the three parsers unstubbed. Deep-
nesting robustness tests extended to the new languages. All visitors use
explicit heap stacks (no AST-depth recursion).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant