Skip to content

feat(ir): startNode(r)/endNode(r) return the endpoint node value (#753)#892

Merged
DecisionNerd merged 4 commits into
mainfrom
feature/753-start-end-node
Jun 23, 2026
Merged

feat(ir): startNode(r)/endNode(r) return the endpoint node value (#753)#892
DecisionNerd merged 4 commits into
mainfrom
feature/753-start-end-node

Conversation

@DecisionNerd

@DecisionNerd DecisionNerd commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Summary

startNode(r) / endNode(r) over a matched fixed-hop relationship now return the endpoint node value — identity, labels, and readable properties — reusing the #785 node-value materialization, instead of a bare UUID (or the prior UnknownFunction error). Closes the #743 deferral.

Approach

A binder-only rewrite (no new lowering or exec):

  • Expand build records each fixed hop's (src, dst) node vars in a new BinderState.edge_vars map (threaded through OPTIONAL-MATCH sub-states like node_vars/path_vars).
  • resolve_endpoint_node maps startNode(r)/endNode(r) over a recorded edge var to the src/dst node var.
  • General / property-access context (startNode(r).name) lowers to a node VarRef, so the access resolves against the endpoint node's columns.
  • Terminal RETURN (RETURN endNode(r)) upgrades to a whole node value via a shared node_struct_expr helper (the same _node_struct path as a bare RETURN n).

Deliberately scoped out: variable-length edge vars bind to a relationship list, not a single relationship, so they are not recorded — startNode/endNode over a * edge (and any other unrecognized argument) falls through to the usual UnknownFunction error rather than silently returning a UUID (which would fail TCK).

Validation

  • New e2e (crates/gf-api/tests/e2e_baseline.rs): start_end_node_return_whole_node (whole-node return, labels + props readable) and start_end_node_property_access (startNode(r).name/endNode(r).name). 73 pass.
  • Binder + lowerer goldens unchanged (gf-ir 92, gf-rel suites green); cargo fmt --all -- --check clean.
  • TCK startNode/endNode feature scenarios remain @skip-rust until the read-path tier (tck: OPTIONAL MATCH, variable-length paths, and UNWIND on Rust core #600) un-skips them; this PR is the engine capability they depend on.

Closes #753.

🤖 Generated with Claude Code

Note

Make startNode(r)/endNode(r) return full node values for fixed-hop directed relationships

  • startNode(r) and endNode(r) now return a whole node value (identity, labels, properties) instead of a UUID when the relationship is a fixed-hop directed edge in a non-optional MATCH.
  • The binder tracks endpoint node variables in a new edge_vars map on BinderState, populated during lower_match for Direction::Out and Direction::In edges.
  • Property access like startNode(r).name now binds correctly to the endpoint node variable, enabling column resolution at query time.
  • Behavioral Change: startNode/endNode remain unresolved (surfacing as UnknownFunction) for OPTIONAL MATCH edges, undirected relationships, and variable-length hops — these cases are explicitly deferred.

Macroscope summarized 48d1e95.

Summary by CodeRabbit

  • New Features

    • startNode(r) and endNode(r) now materialize as full node structures (labels + readable properties) so you can access fields like startNode(r).name / endNode(r).name.
  • Bug Fixes

    • Endpoint resolution now correctly follows traversal direction.
    • For OPTIONAL MATCH relationships, endpoint evaluation now fails when the relationship is null, preventing misleading endpoint values.
  • Tests

    • Expanded e2e coverage to verify RETURN startNode(r) materialization, endpoint property access, direction handling, and OPTIONAL MATCH null behavior.

`startNode(r)` / `endNode(r)` over a matched fixed-hop relationship now
return the start/end node with identity, labels, and readable properties
(reusing #785's node-value materialization), not a bare UUID.

The binder records each fixed hop's `(src, dst)` node vars at `Expand`
build (`edge_vars`) and rewrites the calls onto them:
- general/property-access context (`startNode(r).name`) -> a node VarRef,
  so the access resolves against the endpoint node's columns;
- terminal RETURN (`RETURN endNode(r)`) -> a whole node value via the
  shared `_node_struct` helper.

Variable-length edge vars bind to a relationship *list*, not one
relationship, so they are deliberately not recorded; any unrecognized
argument falls through to the usual `UnknownFunction` error rather than
silently returning a UUID (which would fail TCK). Closes the #743
deferral.

Validated end-to-end: `start_end_node_return_whole_node` and
`start_end_node_property_access` in gf-api e2e_baseline (73 pass); binder
+ lowerer goldens unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

The binder gains an edge_vars: HashMap<VarId, (VarId, VarId)> field on BinderState that records the (src, dst) node variable IDs for fixed single-hop relationship variables. During expression lowering, startNode(r) and endNode(r) calls are resolved to IrExpr::VarRef of the appropriate endpoint node variable and materialized as full node-value structs. Optional-match semantics are preserved by not propagating endpoint bindings from optional sub-expressions to the parent. Four e2e tests validate whole-node return, property access, direction-aware selection, and optional-match error deferral.

Changes

startNode/endNode Endpoint Materialization

Layer / File(s) Summary
BinderState edge_vars field and initialization
crates/gf-ir/src/binder.rs
Adds edge_vars: HashMap<VarId, (VarId, VarId)> to BinderState with documentation excluding variable-length edges, initializes it in Binder::bind, and updates the test fixture to include the new field.
Optional MATCH binding semantics for endpoints
crates/gf-ir/src/binder.rs
During optional MATCH lowering, clones edge_vars into the sub-binder but explicitly does not propagate endpoint bindings back to the parent, ensuring null-aware semantics for unresolved optional relationships.
Expression lowering: startNode/endNode rewrite to VarRef
crates/gf-ir/src/binder.rs
Adds resolve_endpoint_node helper to resolve startNode(r) / endNode(r) to recorded endpoint node vars; lower_expr intercepts these calls to emit VarRef; adds node_struct_expr to centralize node-value struct materialization with optional label support.
Return-item materialization for endpoint nodes
crates/gf-ir/src/binder.rs
lower_return_item_expr's materialize_nodes path treats startNode(r) / endNode(r) the same as bare node variables, resolving them to node vars and routing through node_struct_expr to produce full node-value struct expressions.
E2e tests: endpoint materialization, property access, direction, and optional-match
crates/gf-api/tests/e2e_baseline.rs
Adds four baseline tests: startNode(r) returns a node Struct with labels and properties; startNode(r).name / endNode(r).name resolve to correct values; endpoint selection respects relationship direction; startNode(r) over OPTIONAL MATCH is deferred and errors when null.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

  • DecisionNerd/graphforge#722: Both PRs modify fixed single-hop relationship/path lowering in crates/gf-ir/src/binder.rs to emit Expand with traversal endpoint vars; this PR leverages those endpoint bindings to rewrite startNode(r)/endNode(r) calls.
  • DecisionNerd/graphforge#318: The main PR's binder rewrites startNode(r)/endNode(r) calls into resolved endpoint node expressions, which directly builds on the evaluator's STARTNODE/ENDNODE graph function implementations.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: implementing startNode(r)/endNode(r) to return full endpoint node values instead of UUIDs.
Linked Issues check ✅ Passed The PR fully addresses issue #753's objectives by implementing startNode(r)/endNode(r) to return complete node values with identity, labels, and properties via binder-level rewriting, with proper handling of fixed-hop edges and optional-match scoping.
Out of Scope Changes check ✅ Passed All changes are directly in scope: binder enhancements for edge_vars mapping, endpoint node resolution logic, and e2e tests validating the feature. Variable-length edges are explicitly deferred as documented.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed Pull request description is comprehensive and covers all template sections including summary, type of change, approach, validation, and related issues. Description is well-structured and contains necessary technical details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/753-start-end-node

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
crates/gf-api/tests/e2e_baseline.rs (1)

146-154: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Align test scope with its name (start_end_node_return_whole_node).

This test currently validates only startNode(r) as a whole node. Either rename it to reflect start-only coverage or assert endNode(r) whole-node materialization in the same test to match the stated scope.

Suggested direction
- "MATCH (a:Person {name:'Alice'})-[r:KNOWS]->(b:Person {name:'Bob'}) RETURN startNode(r)",
+ "MATCH (a:Person {name:'Alice'})-[r:KNOWS]->(b:Person {name:'Bob'}) \
+  RETURN startNode(r) AS s, endNode(r) AS e",

Then downcast/assert both s and e as StructArray and verify readable name values (Alice / Bob).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gf-api/tests/e2e_baseline.rs` around lines 146 - 154, The test
function start_end_node_return_whole_node currently only validates startNode(r)
but the test name indicates it should validate both startNode and endNode.
Expand the test to include a query for endNode(r) in addition to the existing
startNode(r) query, then assert that both results are properly materialized as
whole nodes by downcasting them as StructArray and verifying the readable name
properties (Alice for startNode, Bob for endNode) to ensure the endpoint nodes
are fully materialized with their identity, labels, and properties.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gf-ir/src/binder.rs`:
- Around line 333-340: The edge_vars insertion doesn't account for relationship
direction, causing startNode(r) and endNode(r) to resolve to incorrect endpoints
for reversed or undirected edges. When recording the edge variable relationship
in the edge_vars map (in the !is_var_hop block around line 333-340, and also at
lines 1208-1218), check whether the edge traversal is reversed (incoming
direction) and swap the src_var and dst_var values accordingly before insertion.
Additionally, for undirected edges, either skip the insertion entirely or add
special handling to mark them as requiring runtime direction resolution, rather
than using a static left/right assignment.
- Around line 244-246: The propagation of edge_vars from OPTIONAL MATCH in the
loop iterating over sub_state.edge_vars does not preserve null-gating semantics
required by Cypher. When an optional match edge is unmatched (null), accessing
properties through endpoint resolution like startNode(r) should return null, but
currently allows reading from the endpoint node regardless. Either carry the
null-gate information through endpoint resolution and gate endpoint property
access and materialization on the edge variable being non-null, or conditionally
skip this edge_vars propagation when processing OPTIONAL MATCH edges until the
null-gate mechanism is in place. Apply the same fix to the other occurrences at
lines 857-872, 1082-1088, and 1216-1238.

---

Nitpick comments:
In `@crates/gf-api/tests/e2e_baseline.rs`:
- Around line 146-154: The test function start_end_node_return_whole_node
currently only validates startNode(r) but the test name indicates it should
validate both startNode and endNode. Expand the test to include a query for
endNode(r) in addition to the existing startNode(r) query, then assert that both
results are properly materialized as whole nodes by downcasting them as
StructArray and verifying the readable name properties (Alice for startNode, Bob
for endNode) to ensure the endpoint nodes are fully materialized with their
identity, labels, and properties.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8d422a13-a12e-4acb-b361-961a15c9f8e8

📥 Commits

Reviewing files that changed from the base of the PR and between 9d615e7 and 7501a7f.

⛔ Files ignored due to path filters (1)
  • CHANGELOG.md is excluded by !**/*.md
📒 Files selected for processing (2)
  • crates/gf-api/tests/e2e_baseline.rs
  • crates/gf-ir/src/binder.rs

Comment thread crates/gf-ir/src/binder.rs Outdated
Comment thread crates/gf-ir/src/binder.rs
DecisionNerd and others added 2 commits June 23, 2026 10:46
…753)

The shared helper never reads `self` (it works through the
`BinderState` builder), so `cargo clippy --workspace -- -D warnings`
rejected the `&self` receiver. Make it an associated function and call
it as `Self::node_struct_expr`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eRabbit)

Two correctness fixes from review:

- Direction: `src_var`/`dst_var` are the pattern's left/right vars, but the
  relationship's true start/end follow its direction. Record `(src, dst)` for
  outgoing, swap to `(dst, src)` for incoming, and skip undirected edges (their
  orientation is per matched row, so a static rewrite could pick the wrong
  endpoint). Validated: `(Bob)<-[r:KNOWS]-(b)` -> startNode = Alice (source).

- OPTIONAL MATCH: do not propagate `edge_vars` out of an optional sub-state.
  After an unmatched optional, `r` is null and `startNode(r)` must be null, but
  the rewrite would resolve to the outer non-null endpoint var. Without an
  edge-uuid null gate (#889) that yields a wrong value, so optional edges stay
  unresolved (startNode/endNode -> UnknownFunction). Edges still resolve inside
  the optional's own WHERE, where matched rows have a non-null `r`.

New e2e: `start_end_node_respects_incoming_direction`,
`start_node_over_optional_edge_is_deferred`. clippy --workspace -D warnings
clean; gf-ir goldens (92) and e2e (75) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gf-api/tests/e2e_baseline.rs`:
- Around line 271-277: The test at the execute call in the e2e_baseline.rs test
is not actually exercising the null optional branch because Alice always has
KNOWS edges in the test fixture, so the optional match will always succeed with
r being non-null. Modify the MATCH query to use a person or relationship pattern
that actually yields a null result from the OPTIONAL MATCH clause. For example,
change the query to match a person who has no outgoing edges of the specified
type, or use a relationship type that doesn't exist in the fixture, so that the
optional match returns null and the startNode(r) call is forced to handle the
null case as intended by the test assertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1bf6a397-5d71-4c53-b731-7925f421a39b

📥 Commits

Reviewing files that changed from the base of the PR and between 5e8a695 and 8c04847.

⛔ Files ignored due to path filters (1)
  • CHANGELOG.md is excluded by !**/*.md
📒 Files selected for processing (2)
  • crates/gf-api/tests/e2e_baseline.rs
  • crates/gf-ir/src/binder.rs

Comment thread crates/gf-api/tests/e2e_baseline.rs
…CodeRabbit)

Switch the OPTIONAL-MATCH deferral test from Alice (who has KNOWS edges)
to Eve (no outgoing edges), so the represented scenario is a genuinely
unmatched optional. The deferral itself surfaces as a bind-time
UnknownFunction (data-independent), but the data now matches the intent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@DecisionNerd DecisionNerd merged commit 9b91323 into main Jun 23, 2026
41 checks passed
@DecisionNerd DecisionNerd deleted the feature/753-start-end-node branch June 23, 2026 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rust: startNode(r) / endNode(r) — return the node, not the UUID

1 participant