Skip to content

feat(ir): name un-aliased aggregate columns by source text (#598/#599) — corpus 555→570#909

Merged
DecisionNerd merged 1 commit into
mainfrom
feature/599-aggregation-completeness
Jun 24, 2026
Merged

feat(ir): name un-aliased aggregate columns by source text (#598/#599) — corpus 555→570#909
DecisionNerd merged 1 commit into
mainfrom
feature/599-aggregation-completeness

Conversation

@DecisionNerd

@DecisionNerd DecisionNerd commented Jun 24, 2026

Copy link
Copy Markdown
Owner

What

An aggregate RETURN item with no ASRETURN count(*), min(x), sum(i) — was named agg_<idx>, so it failed the TCK's header-exact result comparison (column_by_name("count(*)")). lower_return_aggregate now uses the item's verbatim expression text (ReturnItem::display, the same source #906 introduced for non-aggregate columns), falling back to agg_<idx> only when neither an explicit alias nor display text exists.

This is the dominant Aggregation-corpus pattern (single top-level aggregate, no AS), which the binder already lowered correctly — only the output column name was wrong.

Impact

Corpus passing 555 → 570 (+15, 0 regressions); baseline re-blessed. No golden churn (the IR aggregation golden uses an aliased aggregate).

Follow-up

Naming the group-by keys of a mixed aggregate query (RETURN n.name, sum(n.num) → the n.name column is still named by its lowered expr, not n.name) needs per-key output aliases on the Aggregate IR op — a separate PR. Nested aggregates (count(*) + 1) need the Aggregate→Project decomposition, also separate.

Verification

$ cargo test --workspace --no-fail-fast   # ALL GREEN
$ cargo test -p gf-api --test bdd         # 570 passing, 0 regressed
$ cargo clippy --workspace -- -D warnings # clean

Refs #598/#599 (does not close).

🤖 Generated with Claude Code

Note

Name un-aliased aggregate RETURN columns by their source expression text

  • In binder.rs, aggregate RETURN items without an explicit alias now use the item's verbatim display text (e.g., count(*), min(x)) as the column name, matching openCypher naming behavior, with fallback to agg_<n> only when no display text exists.
  • TCK passing scenarios increase from 555 to 570, with new passing entries across Aggregation, CountingSubgraphMatches, Match, Return, and Comparison feature files.

Macroscope summarized 2ed78ac.

Summary by CodeRabbit

  • Bug Fixes

    • Improved default column naming for aggregate results so returned values use clearer, more consistent labels when no explicit alias is provided.
    • This brings aggregate result naming in line with other returned expressions, reducing unexpected generic column names.
  • Tests

    • Expanded query compliance coverage with additional passing scenarios for aggregation, comparisons, graph match counting, deprecated pattern matching, and return-clause aliasing/interoperation.
    • Increased total passing scenarios in the compatibility matrix.

An aggregate RETURN item with no `AS` (`RETURN count(*)`, `min(x)`, `sum(i)`) was
named `agg_<idx>`, so it failed the TCK's header-exact result comparison. It now
takes the verbatim expression text (`ReturnItem::display`, the same source the
non-aggregate columns use since #906), falling back to `agg_<idx>` only when
neither an explicit alias nor display text is present.

Corpus passing 555 → 570 (+15, 0 regressions); baseline re-blessed. Naming the
group-by keys of a *mixed* aggregate query (`RETURN n.name, sum(n.num)`) is a
follow-up — it needs per-key output aliases on the `Aggregate` op.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

In lower_return_aggregate, the alias resolution for un-aliased aggregate RETURN items is updated to fall back to item.display before defaulting to the synthetic agg_<idx> name. The TCK passing baseline gains 15 new scenario entries across several feature groups, and the coverage matrix scenarios_passing count is updated from 555 to 570.

Changes

Aggregate alias fallback and TCK coverage update

Layer / File(s) Summary
Aggregate alias fallback logic
crates/gf-ir/src/binder.rs
lower_return_aggregate now resolves un-aliased items as item.aliasitem.displayagg_<idx> instead of the previous two-way item.aliasagg_<idx>.
TCK baseline and coverage matrix
tests/tck/passing_baseline.txt, tests/tck/coverage_matrix.json
Adds 15 new passing scenario lines covering Aggregation*, Comparison1, CountingSubgraphMatches1, Match9, Return4, Return5, and Return8; bumps scenarios_passing from 555 to 570.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • DecisionNerd/graphforge#906: Updates lower_return_items in binder.rs to use ReturnItem.display as the fallback for un-aliased non-aggregate RETURN projections — the parallel change to the one in this PR for the aggregate path.
  • DecisionNerd/graphforge#891: Also modifies tests/tck/coverage_matrix.json's scenarios_passing metadata count in the same field updated here.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly states the IR change and the corpus impact, matching the main changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description clearly explains the change, impact, verification, and follow-up work, so it is sufficiently complete despite not mirroring the full template.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/599-aggregation-completeness

Comment @coderabbitai help to get the list of available commands.

@DecisionNerd DecisionNerd merged commit d4856dc into main Jun 24, 2026
43 checks passed
@DecisionNerd DecisionNerd deleted the feature/599-aggregation-completeness branch June 24, 2026 06:10
DecisionNerd added a commit that referenced this pull request Jun 24, 2026
Completes source-text naming for the aggregate path (un-aliased aggregates landed
in #909) and fixes whole-node group keys.

In a mixed aggregate query like `RETURN n.name, count(*)`, the `n.name` group-by
column was named by its lowered expression (`var_0.name`), failing the TCK
header-exact comparison. The `Aggregate` IR op now carries an optional per-key
output alias — `group_aliases`, parallel to `group_by` and
`#[serde(skip_serializing_if = "Vec::is_empty")]` so existing plans/goldens are
unchanged — filled by the binder from each RETURN item's `AS` alias or verbatim
source text and applied by `lower_aggregate` via `.alias()`.

Group keys are also now materialized through the same path as terminal RETURN
items: a whole-node group key (`RETURN n, count(*)`) rewrites to `_node_struct`
rather than a bare `var_N` (which is not a real column and lowered against an
unbound reference). (Thanks CodeRabbit.)

Corpus passing 570 → 575 (+5, 0 regressions); baseline re-blessed. No golden churn
(the IR aggregation golden has an empty group-by).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DecisionNerd added a commit that referenced this pull request Jun 24, 2026
Completes source-text naming for the aggregate path (un-aliased aggregates landed
in #909) and fixes whole-node group keys.

In a mixed aggregate query like `RETURN n.name, count(*)`, the `n.name` group-by
column was named by its lowered expression (`var_0.name`), failing the TCK
header-exact comparison. The `Aggregate` IR op now carries an optional per-key
output alias — `group_aliases`, parallel to `group_by` and
`#[serde(skip_serializing_if = "Vec::is_empty")]` so existing plans/goldens are
unchanged — filled by the binder from each RETURN item's `AS` alias or verbatim
source text and applied by `lower_aggregate` via `.alias()`.

Group keys are also now materialized through the same path as terminal RETURN
items: a whole-node group key (`RETURN n, count(*)`) rewrites to `_node_struct`
rather than a bare `var_N` (which is not a real column and lowered against an
unbound reference). (Thanks CodeRabbit.)

Corpus passing 570 → 575 (+5, 0 regressions); baseline re-blessed. No golden churn
(the IR aggregation golden has an empty group-by).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant