Skip to content

feat(rel): materialize whole node values for RETURN n#890

Merged
DecisionNerd merged 1 commit into
mainfrom
feature/785-node-value-materialization
Jun 23, 2026
Merged

feat(rel): materialize whole node values for RETURN n#890
DecisionNerd merged 1 commit into
mainfrom
feature/785-node-value-materialization

Conversation

@DecisionNerd

@DecisionNerd DecisionNerd commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Summary

A bare RETURN n now materializes a whole node valueStruct{node_uuid, labels: List<Utf8>, <typed properties>} — instead of failing on the bare var_N table qualifier. This satisfies #785's
acceptance ("RETURN n returns a node value whose properties are readable downstream") and unblocks
#753
(startNode/endNode returning a node) and the TCK whole-node scenarios.

Closes #785. Part of M17 (Phase-1 critical path).

How

  • gf-ir/binder — track node vars + their pattern label; rewrite a bare node var in a terminal
    RETURN
    to _node_struct(VarRef, "Label"). WITH / intermediate projections are left as
    references (materializing there would break downstream n.age property access). The label is
    threaded from the pattern because the lowerer's ontology map is empty in exploratory mode.
  • gf-rel — assemble the value at lowering time: node_value_struct composes
    named_struct{node_uuid, labels, <props>}, gated null_unless(node_uuid present) (so an unmatched
    OPTIONAL row yields a null node). Property columns come from a per-plan VarId→NodeShape map built
    from the NodeScans — the same columns join_node_properties materializes as var_N.<prop>.
  • No executor or Shaper changes (the struct passes through untouched, mirroring the var-length edge struct).

Validation

  • New e2e: MATCH (n:Person {name:'Alice'}) RETURN n → struct with node_uuid, labels ["Person"],
    readable name = "Alice".
  • gf-cypher / gf-ir / gf-rel suites + gf-api e2e_baseline (71) green; the with_pipeline IR
    golden is unchanged (WITH correctly not materialized); cargo fmt --all -- --check clean.

Scope / deferred (tracked in #889)

Labeled patterns + terminal RETURN. The unlabelled MATCH (n) RETURN n label (needs a runtime
type_id→name catalog reverse-map), relationship/path values, and the TCK (:Label {..}) render +
scenario un-skipping are #889. Until then, an unlabelled RETURN n yields a safe uuid-only struct.

🤖 Generated with Claude Code

Note

Materialize whole node values for bare RETURN n in graph queries

  • Adds a _node_struct IR function that assembles a {node_uuid, labels, <props…>} DataFusion struct for a node variable, gated by null_unless(node_uuid IS NOT NULL) for OPTIONAL MATCH rows.
  • The binder now tracks node-bound variables in node_vars and rewrites bare node variables in terminal RETURN to _node_struct(var[, label]) calls; WITH clauses are explicitly excluded from this rewrite.
  • The expression lowerer resolves _node_struct by building a named_struct with node_uuid, a labels List, and any persisted property columns looked up from a new node_shapes map.
  • The plan lowerer pre-scans NodeScan operators to build VarId → NodeShape (persisted property columns) before expression lowering.
  • Behavioral Change: RETURN n now returns a StructArray value instead of a raw node reference; WITH n behavior is unchanged.

Macroscope summarized 6cc5ccc.

Summary by CodeRabbit

  • New Features
    • MATCH queries now return complete node values with materialized node identifiers, labels, and all properties in a single structured result.

A bare `RETURN n` over a NodeScan-bound variable now yields a whole node value
— Struct{node_uuid, labels: List<Utf8>, <typed properties>} — instead of
failing on the bare `var_N` qualifier. Unblocks #753 (startNode/endNode
returning a node) and the TCK whole-node scenarios.

- gf-ir/binder: track node vars (+ their pattern label); rewrite a bare node var
  in a TERMINAL RETURN to `_node_struct(VarRef, "Label")`. WITH / intermediate
  projections are left as references (materializing there would break downstream
  property access). The label is threaded because the lowerer's ontology map is
  empty in exploratory mode.
- gf-rel: assemble the value at lowering time — `node_value_struct` composes
  named_struct{node_uuid, labels, <props>} gated null_unless(node_uuid present);
  the property columns come from a per-plan VarId->NodeShape map built from the
  NodeScans (the same columns join_node_properties materializes as var_N.<prop>).
- e2e: `MATCH (n:Person {name:'Alice'}) RETURN n` returns a struct with node_uuid,
  labels ["Person"], and a readable `name` = "Alice".

Validated: gf-cypher / gf-ir / gf-rel suites + gf-api e2e_baseline green; fmt clean.

Scope: labeled patterns + terminal RETURN. Deferred to follow-ups: the unlabelled
`MATCH (n) RETURN n` label (needs a runtime type_id->name catalog reverse-map),
relationship/path values, and the TCK `(:Label {..})` render + scenario un-skipping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Implements whole-node-value materialization for RETURN n. The IR binder gains a node_vars map to track NodeScan-bound variables and rewrites bare terminal-RETURN items into _node_struct function calls. The expression lowerer introduces NodeShape and resolves _node_struct into a DataFusion named_struct containing node_uuid, a labels list, and per-property fields. The plan lowerer seeds per-node property column shapes from on-disk table schemas.

Changes

Node-value materialization for RETURN n

Layer / File(s) Summary
Binder: node-var tracking and _node_struct rewrite
crates/gf-ir/src/binder.rs
Adds node_vars: HashMap<VarId, Option<String>> to BinderState, records label per NodeScan binding in lower_path_pattern, propagates through OPTIONAL MATCH. Refactors lower_return_items to accept materialize_nodes flag; lower_with passes false, lower_return passes true. Adds lower_return_item_expr to rewrite eligible bare node-var expressions into IrExpr::FunctionCall { name: "_node_struct", … }. Updates advisory-mode unit test fixture to initialize the new field.
Expression lowerer: NodeShape and _node_structnamed_struct
crates/gf-rel/src/expr.rs
Introduces pub struct NodeShape { pub prop_names: Vec<String> }. Extends ExprLowerer with node_shapes: HashMap<u32, NodeShape> and a new with_prop_names_and_nodes constructor. Adds a dispatch arm for _node_struct calls to lower_node_struct, which validates args and delegates to node_value_struct to build a null-gated DataFusion named_struct carrying node_uuid, a labels list, and per-property columns.
Plan lowerer: seeding NodeShape from NodeScan property tables
crates/gf-rel/src/lowerer.rs
Adds node_shapes: RefCell<HashMap<u32, NodeShape>> to GraphPlanLowerer. lower_plan seeds it via build_node_shapes, which scans plan ops for NodeScans and calls node_prop_cols to read the on-disk property table schema while excluding topology-column collisions. expr_lowerer construction switches to with_prop_names_and_nodes.
E2E test: RETURN n whole-node Struct
crates/gf-api/tests/e2e_baseline.rs
Adds Arrow imports for ListArray, StringArray, StructArray. Adds match_returns_whole_node_value test: runs MATCH (n:Person {name:'Alice'}) RETURN n, asserts one row, validates the n column is a StructArray with node_uuid, a labels list containing "Person", and name equal to "Alice".

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

  • #785 — This PR directly implements the acceptance criteria: MATCH (n:Person {name:'Alice'}) RETURN n now returns a node value with readable properties, and the infrastructure (IR rewrite → expr lowering → plan seeding) unblocks startNode/endNode returning a node for #753.

Possibly related PRs

  • DecisionNerd/graphforge#694: Introduces GraphPlanLowerer and its lowering pipeline in crates/gf-rel/src/lowerer.rs; this PR extends the same struct with node_shapes and per-NodeScan property column wiring.
  • DecisionNerd/graphforge#695: Refactors lower_plan and NodeScan handling inside GraphPlanLowerer; this PR adds build_node_shapes to the same lower_plan path to derive per-NodeScan property columns.
  • DecisionNerd/graphforge#793: Fixes join_node_properties so property columns are present for already-bound NodeScan destinations — the same property columns this PR reads to populate NodeShape for bare-node materialization.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: materializing whole node values for RETURN n statements in the relational layer.
Linked Issues check ✅ Passed The PR fully implements the acceptance criteria from #785: materializing whole node values (identity+labels+properties) for RETURN n, enabling downstream property access and unblocking #753.
Out of Scope Changes check ✅ Passed All changes are directly related to node value materialization: e2e test addition, binder tracking/rewriting, expression lowering infrastructure, and plan-level node shape computation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering the summary, implementation approach, validation, and scope clearly.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/785-node-value-materialization

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/gf-ir/src/binder.rs (1)

743-751: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Materialize node group keys in aggregate RETURNs too.

RETURN n, count(*) enters the aggregate branch and bypasses lower_return_items(..., true), so the non-aggregate n is still lowered as a bare VarRef instead of _node_struct. That preserves the original col("var_N") failure mode for terminal aggregate returns.

🐛 Proposed direction
-        if r.items.iter().any(|i| agg_func_of(&i.expr).is_some()) {
-            self.lower_return_aggregate(&r.items, s);
+        if r.items.iter().any(|i| agg_func_of(&i.expr).is_some()) {
+            self.lower_return_aggregate(&r.items, s, true);

Then use the same helper for non-aggregate group keys:

-    fn lower_return_aggregate(&self, items: &[ReturnItem], s: &mut BinderState) {
+    fn lower_return_aggregate(
+        &self,
+        items: &[ReturnItem],
+        s: &mut BinderState,
+        materialize_nodes: bool,
+    ) {
         ...
-                group_by.push(self.lower_expr(&item.expr, item.span, s));
+                group_by.push(
+                    self.lower_return_item_expr(&item.expr, item.span, s, materialize_nodes),
+                );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gf-ir/src/binder.rs` around lines 743 - 751, The
`lower_return_aggregate` method is not materializing non-aggregate items (group
keys) in the same way that `lower_return_items` does when called with the third
parameter as true, causing nodes in aggregate RETURN statements like `RETURN n,
count(*)` to remain as bare VarRef instead of being materialized as node
structs. Update `lower_return_aggregate` to apply the same materialization
helper or logic to the non-aggregate items before processing them, ensuring that
group keys are properly transformed into node structures just as they would be
in non-aggregate returns.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gf-rel/src/lowerer.rs`:
- Around line 226-238: The build_node_shapes function only iterates through
top-level ops but fails to collect node shapes from operations nested within
GraphOp::Optional children. This causes node variables introduced inside
OPTIONAL MATCH blocks to have missing shape information. Modify the
build_node_shapes function to also handle the GraphOp::Optional variant by
recursively processing its nested operations in addition to the top-level
GraphOp::NodeScan handling, ensuring that node shape information is collected
regardless of nesting depth within Optional blocks.
- Around line 387-389: The node_shapes cache is only being seeded in the
lower_plan method, but the mixed statement path uses lower_prefix followed by
lower_value_expr, which now depends on having initialized node_shapes. This can
result in empty or stale shapes for statements like MATCH ... SET/CREATE ...
RETURN n. Add the same node_shapes initialization logic (calling
build_node_shapes with the plan's ops) to the lower_prefix method or at the
entry point of the mixed statement path, before lower_value_expr is called, to
ensure node_shapes is properly seeded for that code path as well.

---

Outside diff comments:
In `@crates/gf-ir/src/binder.rs`:
- Around line 743-751: The `lower_return_aggregate` method is not materializing
non-aggregate items (group keys) in the same way that `lower_return_items` does
when called with the third parameter as true, causing nodes in aggregate RETURN
statements like `RETURN n, count(*)` to remain as bare VarRef instead of being
materialized as node structs. Update `lower_return_aggregate` to apply the same
materialization helper or logic to the non-aggregate items before processing
them, ensuring that group keys are properly transformed into node structures
just as they would be in non-aggregate returns.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e283836f-129e-4eea-8b5a-4acd1fcf2af1

📥 Commits

Reviewing files that changed from the base of the PR and between 6442387 and 6cc5ccc.

📒 Files selected for processing (4)
  • crates/gf-api/tests/e2e_baseline.rs
  • crates/gf-ir/src/binder.rs
  • crates/gf-rel/src/expr.rs
  • crates/gf-rel/src/lowerer.rs

Comment thread crates/gf-rel/src/lowerer.rs
Comment thread crates/gf-rel/src/lowerer.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rust: node-value materialization — return a whole node value (identity+labels+props) from an expression

1 participant