Skip to content

Latest commit

 

History

History
286 lines (210 loc) · 12.9 KB

File metadata and controls

286 lines (210 loc) · 12.9 KB

Agents

  • Use jq to analyze json outputs from ast.
  • poolmanager.json - ast generated by solc (single-file build fixture for PoolManager).

Prerequisites: AST Identity Model

Before working on any feature that touches goto-definition, references, call hierarchy, implementation, rename, or the caching system, you must understand how file IDs and node IDs work. Getting this wrong causes cross-build collisions that are extremely hard to debug (a function in one build silently maps to a completely different function in another build).

The two ID types

Type Defined in Inner Assigned by Stable across compilations?
FileId(u64) types.rs:24 unsigned 64-bit PathInterner (canonical) Yes — same path always gets same ID
NodeId(i64) types.rs:4 signed 64-bit solc (per-compilation) No — same function can get different IDs

There is also SolcFileId(String) — a string wrapper used as HashMap keys matching solc's JSON output (e.g. "0", "34"). It is the stringified form of a file ID.

Node IDs are signed because solc uses negative IDs for built-in symbols (-1 for abi, -15 for msg, -18 for require, -28 for this).

The src string format

Every AST node has a src field in the format "offset:length:fileId":

  • offset — byte offset from the start of the source file
  • length — byte length of the source range
  • fileId — which source file this location belongs to

Parsed by SourceLoc::parse() in types.rs. After canonicalization, fileId is a canonical FileId from the PathInterner, not solc's original per-compilation ID.

Why file IDs are unstable from solc

Solc assigns file IDs sequentially based on input order. If you compile Foo.sol first, it gets ID 0. If you compile Bar.sol first, Foo.sol gets a different ID. A single-file build of PoolManager.sol produces different file IDs than a full project build that includes all 160 source files.

How we solve file ID instability: PathInterner

PathInterner (types.rs:718) is a project-wide, append-only table that assigns canonical FileId values from file paths. It lives on ForgeLsp behind Arc<RwLock<PathInterner>>.

The invariant: Once a path is interned, it keeps the same ID for the lifetime of the session. Every CachedBuild::new() call (the only production constructor for fresh builds) does this:

  1. Calls interner.build_remap(&solc_id_to_path) — for each file in this compilation, interns its path and builds a translation table {solc_file_id → canonical_FileId}.
  2. Calls canonicalize_node_info() on every NodeInfo — rewrites the fileId component in src, name_location, name_locations, and member_location strings.
  3. Rewrites external_refs keys the same way.
  4. Sets id_to_path_map = interner.to_id_to_path_map() — the canonical map.

After this, all builds share the same file-ID space. You can safely resolve any src string from any build using any build's id_to_path_map. This is the foundation that makes merging builds and cross-build src lookups safe.

Key code path: goto.rsCachedBuild::new()build_remap() + canonicalize_node_info()

Why node IDs are unstable across compilations

Solc assigns node IDs as a monotonically increasing counter during AST construction. The counter's value depends on how many nodes have been processed before a given declaration. When the compilation closure changes (different files in scope), the same function gets a different numeric ID.

Concrete example from our debugging:

  • File build of PoolManager.sol: swap function = node ID 616
  • Sub-cache build of a library: node ID 616 = a completely different function
  • Searching all files for bare node ID 616 across builds would return the wrong function

This is explicitly documented in code:

  • references.rs:408: "Node IDs are not stable across builds, but byte offsets within a file are."
  • lsp.rs:103: "Each sub-cache has its own node ID space — matching across caches is done by absolute file path + byte offset, not by node ID."

The stable anchor: file path + byte offset

The server uses (absolute_file_path, byte_offset) as the cross-build-safe identifier for any source location. This pair is stable because:

  • File paths don't change between compilations
  • Byte offsets are properties of the source text, not the compilation

The pattern for cross-build lookups:

Step 1: In the originating build, resolve to (abs_path, byte_offset)
        → resolve_target_location() in references.rs:411

Step 2: In each target build, re-resolve to that build's node ID
        → byte_to_id(build.nodes, abs_path, byte_offset) in references.rs:131

byte_to_id() finds the innermost AST node at a byte position using span containment: for every node in the file, checks offset <= position < offset + length, then picks the narrowest (smallest length) match. This gives you the build-local NodeId for the same source location.

When bare node IDs ARE safe

Within a single build's data, node IDs are globally unique and safe to use freely. All of these are safe:

  • build.nodes[abs_path][node_id] — lookup within one build
  • build.decl_index[node_id] — typed declaration lookup within one build
  • node_info.referenced_declaration — following a reference within one build
  • build.base_function_implementation[node_id] — equivalence lookup within one build
  • find_node_info(&build.nodes, node_id) — search all files within one build

When bare node IDs are DANGEROUS

Any time you hold a NodeId from build A and look it up in build B:

  • builds.iter().find_map(|b| find_node_info(&b.nodes, node_id))WRONG, leaks node IDs across builds. A sub-cache may have a completely different function at the same numeric ID.
  • other_build.decl_index.get(&node_id)WRONG unless you know both builds compiled the same file and solc assigned the same IDs (which is true for file build vs project build of the same file, but NOT for sub-caches).

Node identity verification

A NodeId alone is ambiguous across builds, but a NodeId plus its NodeInfo carries enough metadata to prove identity. Every node has:

  • name_location"offset:length:canonicalFileId", a globally unique position
  • The source text at that position — the node's name

Since canonical file IDs are stable (PathInterner) and byte offsets are properties of the source text, checking (file_path, name_offset, name_text) is an O(1) identity proof. This is implemented as verify_node_identity() in call_hierarchy.rs:

// O(1) identity check: does node_id in this build refer to the expected entity?
verify_node_identity(
    &build.nodes,
    node_id,
    expected_abs_path,     // which file
    expected_name_offset,  // byte offset of name_location
    expected_name,         // function/modifier/contract name
) -> bool

The check is: look up build.nodes[abs_path][node_id], parse its name_location offset, compare against the expected offset, then read the source bytes at that span to confirm the name matches. If all three match (file + offset + name), this is guaranteed to be the same source entity regardless of which compilation produced the build.

The resolve pattern (call hierarchy)

When iterating multiple builds to find a target function, use resolve_target_in_build() (call_hierarchy.rs):

for build in &builds {
    let ids = resolve_target_in_build(
        build, node_id, target_abs, target_name, target_name_offset,
    );
    // ids is empty if this build doesn't contain the target,
    // or contains the verified node ID(s) for the target.
}

This uses a two-tier strategy:

  1. Fast path (O(1)): verify_node_identity() — if the numeric ID exists and passes identity verification, accept it immediately.
  2. Slow path (O(n)): byte_to_id() — if the ID doesn't exist or fails verification (e.g. sub-cache with a different function at the same numeric ID), re-resolve by byte offset using span containment.

This replaces the older pattern of contains_key + inline name/position scan. Used in both callHierarchy/incomingCalls and callHierarchy/outgoingCalls.

Deduplication across builds

When the same function appears in multiple builds (file build + project build both contain PoolManager.swap), the results will have different NodeIds but the same source position. Always dedup by source position (e.g. selectionRange.start), never by node ID.

TreeSitter as a complementary system

TreeSitter operates on the live buffer text (including unsaved edits) and is completely independent of solc's AST and its IDs. It is used for:

  • Dirty-file goto-definition fallback (when AST byte offsets are stale)
  • Document symbols, semantic tokens, folding ranges, selection ranges
  • Signature help (finding the enclosing call expression)
  • Code actions, highlight, rename (identifier collection)

TreeSitter nodes are identified by byte ranges in the current buffer, not by any persistent ID. They are always re-parsed from the current text.

The three build types

Build type Created by Scope Node ID space
File build get_or_fetch_build()CachedBuild::new() Target file + its imports Shared with project build for same file
Project build ensure_project_cached_build() All src + test + script files Shared with file builds for overlapping files
Sub-cache load_lib_cache()from_reference_index() Library sub-project files Isolated — different IDs for same functions

The builds vector in LSP handlers is typically:

let mut builds = vec![&file_build];
if let Some(ref pb) = project_build { builds.push(pb); }
for sc in sub_caches.iter() { builds.push(sc); }

Key rule: File build and project build share node IDs for the same file. Sub-caches do NOT — always use the scoped lookup pattern for sub-caches.

Summary of safe patterns

Operation Safe method Unsafe method
Cross-build function lookup resolve_target_in_build() (verify + fallback) Bare NodeId across builds
Cross-build node identity verify_node_identity(nodes, id, path, offset, name) contains_key() without validation
Cross-build src resolution Any build's id_to_path_map (canonical) Raw solc file IDs
Dedup across builds Source position (Range.start) Node ID comparison
Sub-cache node lookup verify_node_identity()byte_to_id() fallback find_node_info() across all files
Within single build Free use of NodeId everywhere N/A

Fixture reference

  • poolmanager.json — single-file solc AST output for PoolManager.sol. Node IDs in this fixture are from a file-level build. In a full project build, the same functions will have the same IDs (same file), but a sub-cache build of a different library will have a completely different mapping of IDs to functions.

Use jq to explore the AST:

# Find all FunctionDefinition nodes and their IDs
jq '[.. | objects | select(.nodeType == "FunctionDefinition") | {id, name}]' poolmanager.json

# Find a specific node's referencedDeclaration targets
jq '.. | objects | select(.referencedDeclaration == 616)' poolmanager.json

# Show the source_id_to_path mapping (solc's per-compilation file IDs)
jq '.sources | to_entries | map({key: .value.ast.id, path: .key})' poolmanager.json

Testing and Debugging

Always use lsp-bench as the first choice when you want to debug lsp methods and their output. The lsp-bench repo is https://github.com/mmsaki/lsp-bench (local clone path: ../lsp-bench).

There are many examples on ./benchmarks on how to write a simple yaml config to your needs.

lsp-bench tips for cross-build features

For features that depend on cross-file data (references, call hierarchy, implementation), you need a full project index. Add this to your benchmark config:

initializeSettings:
  projectIndex:
    fullProjectScan: true

Then use waitForProgressToken to wait for the index to complete:

- method: callHierarchy/incomingCalls
  waitForProgressToken: "solidity/projectIndexFull"

Phase 1 (solidity/projectIndex) covers src-only files. Phase 2 (solidity/projectIndexFull) covers src + test + script. Cross-file incoming callers require phase 2.

Building

Always build with --release flag

Documentation Sync Rule

When adding or changing a struct field, LSP method, named data structure, or feature behavior in src/, update the corresponding reference page in docs/pages/reference/ in the same commit.

Also keep these files in sync with each other whenever LSP methods or features change:

  • FEATURES.md (root) and docs/pages/docs/features.md must always match
  • CHANGELOG.md (root) and docs/pages/changelog.md must always match

After any doc changes, run bun run docs:publish to deploy to Cloudflare Pages.