feat(cypher): name un-aliased RETURN columns by their source text (#598) — corpus 250→313#906
Conversation
openCypher names a projection column by the expression as written (`n.prop`, `count(*)`, `a.x IS NULL`). GraphForge left un-aliased items' names to DataFusion, which derives them from the LOWERED expression (internal `var_N` qualifiers, substituted literals) — so a large class of scenarios computed the right values under the wrong header and failed the TCK's header-exact result comparison. The parser now captures each un-aliased item's verbatim expression text into `ReturnItem::display` (sliced from the source by the expr's span; `RETURN *` expands to one display per variable). The binder uses it as the default `RETURN` column name when there is no explicit `AS`, after the existing node/path-var naming. `WITH` is unchanged (it still requires an alias for non-variable items). Corpus passing 250 → 313 (+63: Boolean1–5, Comparison1–2, Match5, List, Graph6, …); 0 regressions; baseline re-blessed. IR goldens re-blessed (un-aliased columns now carry their source-text alias). e2e node_uuid tests pin the column name with an explicit `AS node_uuid` (they assert the uuid type/value, not column naming). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WalkthroughAdds a ChangesReturnItem.display for deterministic projection column names
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/gf-ir/src/binder.rs (1)
923-934: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift
displaynaming is skipped for aggregate RETURN paths.Line 930 routes any aggregate-containing
RETURNtolower_return_aggregate, so the new fallback at Line 1001 (item.display) is never applied there. As a result, unaliased aggregate outputs still default toagg_{idx}(Line 1077), and aggregate RETURN headers remain inconsistent with the deterministic expression-text naming introduced in this PR.Suggested starting fix
- let alias = item.alias.clone().unwrap_or_else(|| format!("agg_{idx}")); + let alias = item + .alias + .clone() + .or_else(|| item.display.clone()) + .unwrap_or_else(|| format!("agg_{idx}"));Also applies to: 994-1012, 1053-1078
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gf-ir/src/binder.rs` around lines 923 - 934, The lower_return function routes aggregate-containing RETURN clauses to lower_return_aggregate, which bypasses the new display naming fallback introduced in this PR (used in the non-aggregate path via lower_return_items). This causes unaliased aggregate outputs to default to agg_{idx} naming instead of using the deterministic expression-text naming. Apply the same item.display naming logic that is used in the non-aggregate branch (around the lower_return_items call) to the lower_return_aggregate function to ensure consistent and deterministic naming for both aggregate and non-aggregate RETURN clauses.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/gf-ir/src/binder.rs`:
- Around line 923-934: The lower_return function routes aggregate-containing
RETURN clauses to lower_return_aggregate, which bypasses the new display naming
fallback introduced in this PR (used in the non-aggregate path via
lower_return_items). This causes unaliased aggregate outputs to default to
agg_{idx} naming instead of using the deterministic expression-text naming.
Apply the same item.display naming logic that is used in the non-aggregate
branch (around the lower_return_items call) to the lower_return_aggregate
function to ensure consistent and deterministic naming for both aggregate and
non-aggregate RETURN clauses.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 18f3564a-797f-4c7f-8ce2-c60d61ac0c00
⛔ Files ignored due to path filters (12)
CHANGELOG.mdis excluded by!**/*.mdcrates/gf-ir/tests/ir_goldens/golden__filtered_scan.snapis excluded by!**/*.snapcrates/gf-ir/tests/ir_goldens/golden__one_hop_expand.snapis excluded by!**/*.snapcrates/gf-ir/tests/ir_goldens/golden__optional_match.snapis excluded by!**/*.snapcrates/gf-ir/tests/ir_goldens/golden__order_by_limit.snapis excluded by!**/*.snapcrates/gf-ir/tests/ir_goldens/golden__parameter.snapis excluded by!**/*.snapcrates/gf-ir/tests/ir_goldens/golden__simple_node_scan.snapis excluded by!**/*.snapcrates/gf-ir/tests/ir_goldens/golden__two_hop_expand.snapis excluded by!**/*.snapcrates/gf-ir/tests/ir_goldens/golden__unwind.snapis excluded by!**/*.snapcrates/gf-ir/tests/ir_goldens/golden__variable_length_expand.snapis excluded by!**/*.snapcrates/gf-ir/tests/ir_goldens/golden__with_pipeline.snapis excluded by!**/*.snapdocs/reference/tck-compliance.mdis excluded by!**/*.md,!**/docs/**
📒 Files selected for processing (8)
crates/gf-api/tests/e2e_baseline.rscrates/gf-ast/src/ast.rscrates/gf-ast/src/tests.rscrates/gf-cypher/src/parser/clauses.rscrates/gf-cypher/src/parser/mod.rscrates/gf-ir/src/binder.rstests/tck/coverage_matrix.jsontests/tck/passing_baseline.txt
#909) An aggregate RETURN item with no `AS` (`RETURN count(*)`, `min(x)`, `sum(i)`) was named `agg_<idx>`, so it failed the TCK's header-exact result comparison. It now takes the verbatim expression text (`ReturnItem::display`, the same source the non-aggregate columns use since #906), falling back to `agg_<idx>` only when neither an explicit alias nor display text is present. Corpus passing 555 → 570 (+15, 0 regressions); baseline re-blessed. Naming the group-by keys of a *mixed* aggregate query (`RETURN n.name, sum(n.num)`) is a follow-up — it needs per-key output aliases on the `Aggregate` op. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
openCypher names an un-aliased projection column by the expression as written —
n.prop,count(*),a.x IS NULL. GraphForge left those names to DataFusion, which derives them from the lowered expression (internalvar_Nqualifiers, substituted literals). So a large class of TCK scenarios computed the right values under the wrong header and failed the header-exactthe result should becomparison (column_by_name(<header>)).ReturnItem::display(sliced from the source via the expr'sspan).RETURN *expands to onedisplayper in-scope variable.lower_return_itemsusesdisplayas the default column name when there is no explicitAS, after the existing node/path-variable naming.WITHis untouched — it still requires an explicit alias for non-variable items.Impact (advisory TCK gate)
Corpus passing 250 → 313 (+63, 0 regressions) — Boolean1–5, Comparison1–2, Match5, List1/6, Graph6, and more, which were value-correct but header-mismatched. Baseline re-blessed.
Test/golden updates
"alias": null → "n.name"). gf-rel goldens unaffected (their queries are all aliased).e2e_baselinenode_uuid tests now pin the column with an explicitAS node_uuid— they assert the uuid type/value, not column naming.Verification
Refs #598 (does not close).
🤖 Generated with Claude Code
Note
Name un-aliased Cypher RETURN columns by their verbatim source expression text
displayfield toReturnItemin ast.rs that stores the verbatim source text of un-aliased projection expressions, captured via a newTokenStream.text()helper in the parser.lower_return_itemsto usedisplayas the default column name for un-aliased expressions, while bare path/node variables continue to use the variable name and explicitASaliases take priority.RETURN *expansion now names each synthesized column by its variable identifier.ASaliases to remain compatible with the new naming behavior.RETURN n.node_uuid) are now namedn.node_uuidinstead of a derived name; callers relying on the old column names must add explicit aliases.Macroscope summarized 292b913.
Summary by CodeRabbit
New Features
Tests