feat(ir): name un-aliased aggregate columns by source text (#598/#599) — corpus 555→570 by DecisionNerd · Pull Request #909 · DecisionNerd/graphforge

DecisionNerd · 2026-06-24T05:56:44Z

What

An aggregate RETURN item with no AS — RETURN count(*), min(x), sum(i) — was named agg_<idx>, so it failed the TCK's header-exact result comparison (column_by_name("count(*)")). lower_return_aggregate now uses the item's verbatim expression text (ReturnItem::display, the same source #906 introduced for non-aggregate columns), falling back to agg_<idx> only when neither an explicit alias nor display text exists.

This is the dominant Aggregation-corpus pattern (single top-level aggregate, no AS), which the binder already lowered correctly — only the output column name was wrong.

Impact

Corpus passing 555 → 570 (+15, 0 regressions); baseline re-blessed. No golden churn (the IR aggregation golden uses an aliased aggregate).

Follow-up

Naming the group-by keys of a mixed aggregate query (RETURN n.name, sum(n.num) → the n.name column is still named by its lowered expr, not n.name) needs per-key output aliases on the Aggregate IR op — a separate PR. Nested aggregates (count(*) + 1) need the Aggregate→Project decomposition, also separate.

Verification

$ cargo test --workspace --no-fail-fast   # ALL GREEN
$ cargo test -p gf-api --test bdd         # 570 passing, 0 regressed
$ cargo clippy --workspace -- -D warnings # clean

Refs #598/#599 (does not close).

🤖 Generated with Claude Code

Note

Name un-aliased aggregate RETURN columns by their source expression text

In binder.rs, aggregate RETURN items without an explicit alias now use the item's verbatim display text (e.g., count(*), min(x)) as the column name, matching openCypher naming behavior, with fallback to agg_<n> only when no display text exists.
TCK passing scenarios increase from 555 to 570, with new passing entries across Aggregation, CountingSubgraphMatches, Match, Return, and Comparison feature files.

^{Macroscope summarized 2ed78ac.}

Summary by CodeRabbit

Bug Fixes
- Improved default column naming for aggregate results so returned values use clearer, more consistent labels when no explicit alias is provided.
- This brings aggregate result naming in line with other returned expressions, reducing unexpected generic column names.
Tests
- Expanded query compliance coverage with additional passing scenarios for aggregation, comparisons, graph match counting, deprecated pattern matching, and return-clause aliasing/interoperation.
- Increased total passing scenarios in the compatibility matrix.

An aggregate RETURN item with no `AS` (`RETURN count(*)`, `min(x)`, `sum(i)`) was named `agg_<idx>`, so it failed the TCK's header-exact result comparison. It now takes the verbatim expression text (`ReturnItem::display`, the same source the non-aggregate columns use since #906), falling back to `agg_<idx>` only when neither an explicit alias nor display text is present. Corpus passing 555 → 570 (+15, 0 regressions); baseline re-blessed. Naming the group-by keys of a *mixed* aggregate query (`RETURN n.name, sum(n.num)`) is a follow-up — it needs per-key output aliases on the `Aggregate` op. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-24T05:57:00Z

Walkthrough

In lower_return_aggregate, the alias resolution for un-aliased aggregate RETURN items is updated to fall back to item.display before defaulting to the synthetic agg_<idx> name. The TCK passing baseline gains 15 new scenario entries across several feature groups, and the coverage matrix scenarios_passing count is updated from 555 to 570.

Changes

Aggregate alias fallback and TCK coverage update

Layer / File(s)	Summary
Aggregate alias fallback logic `crates/gf-ir/src/binder.rs`	`lower_return_aggregate` now resolves un-aliased items as `item.alias` → `item.display` → `agg_<idx>` instead of the previous two-way `item.alias` → `agg_<idx>`.
TCK baseline and coverage matrix `tests/tck/passing_baseline.txt`, `tests/tck/coverage_matrix.json`	Adds 15 new passing scenario lines covering `Aggregation*`, `Comparison1`, `CountingSubgraphMatches1`, `Match9`, `Return4`, `Return5`, and `Return8`; bumps `scenarios_passing` from 555 to 570.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

DecisionNerd/graphforge#906: Updates lower_return_items in binder.rs to use ReturnItem.display as the fallback for un-aliased non-aggregate RETURN projections — the parallel change to the one in this PR for the aggregate path.
DecisionNerd/graphforge#891: Also modifies tests/tck/coverage_matrix.json's scenarios_passing metadata count in the same field updated here.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly states the IR change and the corpus impact, matching the main changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The description clearly explains the change, impact, verification, and follow-up work, so it is sufficiently complete despite not mirroring the full template.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/599-aggregation-completeness

_{Comment @coderabbitai help to get the list of available commands.}

Completes source-text naming for the aggregate path (un-aliased aggregates landed in #909) and fixes whole-node group keys. In a mixed aggregate query like `RETURN n.name, count(*)`, the `n.name` group-by column was named by its lowered expression (`var_0.name`), failing the TCK header-exact comparison. The `Aggregate` IR op now carries an optional per-key output alias — `group_aliases`, parallel to `group_by` and `#[serde(skip_serializing_if = "Vec::is_empty")]` so existing plans/goldens are unchanged — filled by the binder from each RETURN item's `AS` alias or verbatim source text and applied by `lower_aggregate` via `.alias()`. Group keys are also now materialized through the same path as terminal RETURN items: a whole-node group key (`RETURN n, count(*)`) rewrites to `_node_struct` rather than a bare `var_N` (which is not a real column and lowered against an unbound reference). (Thanks CodeRabbit.) Corpus passing 570 → 575 (+5, 0 regressions); baseline re-blessed. No golden churn (the IR aggregation golden has an empty group-by). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Completes source-text naming for the aggregate path (un-aliased aggregates landed in #909) and fixes whole-node group keys. In a mixed aggregate query like `RETURN n.name, count(*)`, the `n.name` group-by column was named by its lowered expression (`var_0.name`), failing the TCK header-exact comparison. The `Aggregate` IR op now carries an optional per-key output alias — `group_aliases`, parallel to `group_by` and `#[serde(skip_serializing_if = "Vec::is_empty")]` so existing plans/goldens are unchanged — filled by the binder from each RETURN item's `AS` alias or verbatim source text and applied by `lower_aggregate` via `.alias()`. Group keys are also now materialized through the same path as terminal RETURN items: a whole-node group key (`RETURN n, count(*)`) rewrites to `_node_struct` rather than a bare `var_N` (which is not a real column and lowered against an unbound reference). (Thanks CodeRabbit.) Corpus passing 570 → 575 (+5, 0 regressions); baseline re-blessed. No golden churn (the IR aggregation golden has an empty group-by). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

DecisionNerd merged commit d4856dc into main Jun 24, 2026
43 checks passed

DecisionNerd deleted the feature/599-aggregation-completeness branch June 24, 2026 06:10

DecisionNerd mentioned this pull request Jun 24, 2026

feat(ir): name mixed-aggregate group-by keys by source text (#599) — corpus 570→574 #910

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ir): name un-aliased aggregate columns by source text (#598/#599) — corpus 555→570#909

feat(ir): name un-aliased aggregate columns by source text (#598/#599) — corpus 555→570#909
DecisionNerd merged 1 commit into
mainfrom
feature/599-aggregation-completeness

DecisionNerd commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DecisionNerd commented Jun 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Impact

Follow-up

Verification

Name un-aliased aggregate RETURN columns by their source expression text

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DecisionNerd commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading