feat(cypher): name un-aliased RETURN columns by their source text (#598) — corpus 250→313 by DecisionNerd · Pull Request #906 · DecisionNerd/graphforge

DecisionNerd · 2026-06-24T03:42:19Z

What

openCypher names an un-aliased projection column by the expression as written — n.prop, count(*), a.x IS NULL. GraphForge left those names to DataFusion, which derives them from the lowered expression (internal var_N qualifiers, substituted literals). So a large class of TCK scenarios computed the right values under the wrong header and failed the header-exact the result should be comparison (column_by_name(<header>)).

Parser: captures each un-aliased item's verbatim expression text into a new ReturnItem::display (sliced from the source via the expr's span). RETURN * expands to one display per in-scope variable.
Binder: lower_return_items uses display as the default column name when there is no explicit AS, after the existing node/path-variable naming.
WITH is untouched — it still requires an explicit alias for non-variable items.

Impact (advisory TCK gate)

Corpus passing 250 → 313 (+63, 0 regressions) — Boolean1–5, Comparison1–2, Match5, List1/6, Graph6, and more, which were value-correct but header-mismatched. Baseline re-blessed.

Test/golden updates

gf-ir logical-plan goldens re-blessed: un-aliased columns now carry their source-text alias (e.g. "alias": null → "n.name"). gf-rel goldens unaffected (their queries are all aliased).
e2e_baseline node_uuid tests now pin the column with an explicit AS node_uuid — they assert the uuid type/value, not column naming.

Verification

$ cargo test --workspace --no-fail-fast       # ALL GREEN
$ cargo test -p gf-api --test bdd             # 313 passing, 0 regressed
$ cargo clippy --workspace -- -D warnings     # clean

Refs #598 (does not close).

🤖 Generated with Claude Code

Note

Name un-aliased Cypher RETURN columns by their verbatim source expression text

Adds a display field to ReturnItem in ast.rs that stores the verbatim source text of un-aliased projection expressions, captured via a new TokenStream.text() helper in the parser.
Updates the binder's lower_return_items to use display as the default column name for un-aliased expressions, while bare path/node variables continue to use the variable name and explicit AS aliases take priority.
RETURN * expansion now names each synthesized column by its variable identifier.
Expands TCK scenario passing count from 250 to 313; existing e2e tests are updated to use explicit AS aliases to remain compatible with the new naming behavior.
Behavioral Change: un-aliased property access columns (e.g. RETURN n.node_uuid) are now named n.node_uuid instead of a derived name; callers relying on the old column names must add explicit aliases.

^{Macroscope summarized 292b913.}

Summary by CodeRabbit

New Features
- Improved deterministic column naming for unaliased RETURN expressions to better align with openCypher standards.
Tests
- Expanded test coverage significantly: 313 scenarios now passing (up from 250), including Boolean operations, comparisons, graph literals, list indexing, pattern matching, mathematical functions, and return projections.

openCypher names a projection column by the expression as written (`n.prop`, `count(*)`, `a.x IS NULL`). GraphForge left un-aliased items' names to DataFusion, which derives them from the LOWERED expression (internal `var_N` qualifiers, substituted literals) — so a large class of scenarios computed the right values under the wrong header and failed the TCK's header-exact result comparison. The parser now captures each un-aliased item's verbatim expression text into `ReturnItem::display` (sliced from the source by the expr's span; `RETURN *` expands to one display per variable). The binder uses it as the default `RETURN` column name when there is no explicit `AS`, after the existing node/path-var naming. `WITH` is unchanged (it still requires an alias for non-variable items). Corpus passing 250 → 313 (+63: Boolean1–5, Comparison1–2, Match5, List, Graph6, …); 0 regressions; baseline re-blessed. IR goldens re-blessed (un-aliased columns now carry their source-text alias). e2e node_uuid tests pin the column name with an explicit `AS node_uuid` (they assert the uuid type/value, not column naming). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-24T03:42:33Z

Walkthrough

Adds a display: Option<String> field to ReturnItem in the AST to store verbatim expression text for unaliased projections. The parser populates this field via a new TokenStream::text helper. The binder then uses display to assign deterministic column names for unaliased RETURN items. E2E tests add explicit AS node_uuid aliases, and the TCK passing baseline gains 63 new scenarios.

Changes

ReturnItem.display for deterministic projection column names

Layer / File(s)	Summary
`ReturnItem.display` AST contract `crates/gf-ast/src/ast.rs`, `crates/gf-ast/src/tests.rs`	`ReturnItem` gains `pub display: Option<String>` with `#[serde(default)]`. AST round-trip tests (`ast_return_clause_roundtrip`, `ast_call_clause_roundtrip`) updated to include `display: None` in struct initializers.
Parser captures verbatim expression text `crates/gf-cypher/src/parser/mod.rs`, `crates/gf-cypher/src/parser/clauses.rs`	`TokenStream::text(span)` slices the raw input by byte-offset span, returning `""` on out-of-bounds. `parse_one_return_item` uses it to capture the trimmed expression source as `display` before optional `AS` alias parsing; `RETURN *` sets `display: None`.
Binder uses `display` for unaliased projection aliasing `crates/gf-ir/src/binder.rs`	`lower_return_items` now defaults an unaliased item's projection column name to `item.display` instead of leaving it unspecified. `expand_return_wildcard` sets `display: Some(variable_name)` on each expanded `RETURN *` item.
E2E tests, TCK baseline, and coverage updates `crates/gf-api/tests/e2e_baseline.rs`, `tests/tck/passing_baseline.txt`, `tests/tck/coverage_matrix.json`	E2E tests add explicit `AS node_uuid` aliases so column-name lookups remain stable under the new projection naming. TCK passing baseline adds 63 scenarios across Boolean, Comparison, Graph, List, Match, Return, TypeConversion, and Unwind categories. Coverage matrix increments `scenarios_passing` from 250 to 313.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

DecisionNerd/graphforge#171: Changes how column names are derived for unaliased RETURN projections via expression-to-string conversion and enforces UNION branch column-name equality, directly overlapping with this PR's projection naming goal.
DecisionNerd/graphforge#367: Adds a source_text field on ReturnItem and uses it to drive projection column key naming in the executor — the same structural pattern as this PR's display field.
DecisionNerd/graphforge#897: Modifies RETURN * wildcard handling in binder.rs, directly adjacent to the expand_return_wildcard change in this PR that sets display on expanded items.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding source-text-based naming for un-aliased RETURN columns in Cypher, with TCK corpus metrics (+63 passing scenarios).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description comprehensively addresses what changed, why, the impact, and test updates with clear examples and verification results.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/598-return-column-names

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/gf-ir/src/binder.rs (1)

923-934: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

display naming is skipped for aggregate RETURN paths.

Line 930 routes any aggregate-containing RETURN to lower_return_aggregate, so the new fallback at Line 1001 (item.display) is never applied there. As a result, unaliased aggregate outputs still default to agg_{idx} (Line 1077), and aggregate RETURN headers remain inconsistent with the deterministic expression-text naming introduced in this PR.

Suggested starting fix

-                let alias = item.alias.clone().unwrap_or_else(|| format!("agg_{idx}"));
+                let alias = item
+                    .alias
+                    .clone()
+                    .or_else(|| item.display.clone())
+                    .unwrap_or_else(|| format!("agg_{idx}"));

Also applies to: 994-1012, 1053-1078

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gf-ir/src/binder.rs` around lines 923 - 934, The lower_return function
routes aggregate-containing RETURN clauses to lower_return_aggregate, which
bypasses the new display naming fallback introduced in this PR (used in the
non-aggregate path via lower_return_items). This causes unaliased aggregate
outputs to default to agg_{idx} naming instead of using the deterministic
expression-text naming. Apply the same item.display naming logic that is used in
the non-aggregate branch (around the lower_return_items call) to the
lower_return_aggregate function to ensure consistent and deterministic naming
for both aggregate and non-aggregate RETURN clauses.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/gf-ir/src/binder.rs`:
- Around line 923-934: The lower_return function routes aggregate-containing
RETURN clauses to lower_return_aggregate, which bypasses the new display naming
fallback introduced in this PR (used in the non-aggregate path via
lower_return_items). This causes unaliased aggregate outputs to default to
agg_{idx} naming instead of using the deterministic expression-text naming.
Apply the same item.display naming logic that is used in the non-aggregate
branch (around the lower_return_items call) to the lower_return_aggregate
function to ensure consistent and deterministic naming for both aggregate and
non-aggregate RETURN clauses.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 18f3564a-797f-4c7f-8ce2-c60d61ac0c00

📥 Commits

Reviewing files that changed from the base of the PR and between 8ffeea7 and 292b913.

⛔ Files ignored due to path filters (12)

CHANGELOG.md is excluded by !**/*.md
crates/gf-ir/tests/ir_goldens/golden__filtered_scan.snap is excluded by !**/*.snap
crates/gf-ir/tests/ir_goldens/golden__one_hop_expand.snap is excluded by !**/*.snap
crates/gf-ir/tests/ir_goldens/golden__optional_match.snap is excluded by !**/*.snap
crates/gf-ir/tests/ir_goldens/golden__order_by_limit.snap is excluded by !**/*.snap
crates/gf-ir/tests/ir_goldens/golden__parameter.snap is excluded by !**/*.snap
crates/gf-ir/tests/ir_goldens/golden__simple_node_scan.snap is excluded by !**/*.snap
crates/gf-ir/tests/ir_goldens/golden__two_hop_expand.snap is excluded by !**/*.snap
crates/gf-ir/tests/ir_goldens/golden__unwind.snap is excluded by !**/*.snap
crates/gf-ir/tests/ir_goldens/golden__variable_length_expand.snap is excluded by !**/*.snap
crates/gf-ir/tests/ir_goldens/golden__with_pipeline.snap is excluded by !**/*.snap
docs/reference/tck-compliance.md is excluded by !**/*.md, !**/docs/**

📒 Files selected for processing (8)

crates/gf-api/tests/e2e_baseline.rs
crates/gf-ast/src/ast.rs
crates/gf-ast/src/tests.rs
crates/gf-cypher/src/parser/clauses.rs
crates/gf-cypher/src/parser/mod.rs
crates/gf-ir/src/binder.rs
tests/tck/coverage_matrix.json
tests/tck/passing_baseline.txt

#909) An aggregate RETURN item with no `AS` (`RETURN count(*)`, `min(x)`, `sum(i)`) was named `agg_<idx>`, so it failed the TCK's header-exact result comparison. It now takes the verbatim expression text (`ReturnItem::display`, the same source the non-aggregate columns use since #906), falling back to `agg_<idx>` only when neither an explicit alias nor display text is present. Corpus passing 555 → 570 (+15, 0 regressions); baseline re-blessed. Naming the group-by keys of a *mixed* aggregate query (`RETURN n.name, sum(n.num)`) is a follow-up — it needs per-key output aliases on the `Aggregate` op. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

DecisionNerd merged commit 4bd3433 into main Jun 24, 2026
43 checks passed

DecisionNerd deleted the feature/598-return-column-names branch June 24, 2026 03:54

DecisionNerd mentioned this pull request Jun 24, 2026

feat(ir): name un-aliased aggregate columns by source text (#598/#599) — corpus 555→570 #909

Merged

coderabbitai Bot mentioned this pull request Jun 24, 2026

feat(ir): name mixed-aggregate group-by keys by source text (#599) — corpus 570→574 #910

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cypher): name un-aliased RETURN columns by their source text (#598) — corpus 250→313#906

feat(cypher): name un-aliased RETURN columns by their source text (#598) — corpus 250→313#906
DecisionNerd merged 1 commit into
mainfrom
feature/598-return-column-names

DecisionNerd commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DecisionNerd commented Jun 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Impact (advisory TCK gate)

Test/golden updates

Verification

Name un-aliased Cypher RETURN columns by their verbatim source expression text

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DecisionNerd commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading