feat(ir): name mixed-aggregate group-by keys by source text (#599) — corpus 570→574 by DecisionNerd · Pull Request #910 · DecisionNerd/graphforge

DecisionNerd · 2026-06-24T06:56:18Z

What

Completes source-text naming for the aggregate path (un-aliased aggregates landed in #909). In a mixed aggregate query like RETURN n.name, count(*), the n.name group-by column was named by its lowered expression (var_0.name), failing the TCK header-exact comparison.

The Aggregate IR op now carries an optional per-key output alias — group_aliases, parallel to group_by and #[serde(skip_serializing_if = "Vec::is_empty")] so existing plans/goldens are unchanged. The binder fills it from each RETURN item's AS alias or verbatim source text; lower_aggregate applies it via .alias() on the group expression.

Impact

Corpus passing 570 → 574 (+4, 0 regressions); baseline re-blessed. Modest yield (mixed aggregate queries are uncommon) but it closes the aggregate-naming gap. No golden churn (the IR aggregation golden has an empty group-by).

Verification

$ cargo test --workspace --no-fail-fast   # ALL GREEN
$ cargo test -p gf-api --test bdd         # 574 passing, 0 regressed
$ cargo clippy --workspace -- -D warnings # clean

Refs #599 (does not close). Next aggregation slice: nested aggregates (count(*)+1) via Aggregate→Project decomposition.

🤖 Generated with Claude Code

Note

Name mixed-aggregate group-by keys by source text in the IR aggregate op

Adds group_aliases: Vec<Option<String>> to GraphOp::Aggregate in plan.rs, carrying per-key output aliases alongside group_by keys
The binder in binder.rs now captures aliases from AS clauses or display text for each group-by key, and materializes bare node variables to node structs using terminal RETURN item semantics
The relational lowerer in lowerer.rs applies these aliases to group-by expressions in the lowered plan, so output columns carry the intended names
Increases passing TCK scenarios from 570 to 575, covering count/sum non-null and implicit grouping cases
Behavioral Change: group-by keys in aggregate ops are now materialized differently — bare node variables yield node structs instead of node references

^{Macroscope summarized 29a85e4.}

Summary by CodeRabbit

New Features
- Added per-group output alias support for grouped aggregate results, improving readable column names.
- Enabled implicit grouping with aggregate queries in return clauses.
Bug Fixes
- Improved aggregate plan lowering and JSON round-tripping to correctly carry group key aliases.
- Ensured aggregate output naming stays consistent from planning through execution.
Tests
- Updated aggregate/return clause coverage and adjusted TCK passing baselines to include null-handling and new grouped aggregation scenarios.

coderabbitai · 2026-06-24T06:56:35Z

Walkthrough

GraphOp::Aggregate gains a group_aliases field. The binder now derives group-key names from RETURN item aliases or display text and stores them in that field. The relational lowerer applies those names to lowered group-key expressions. TCK passing count increases to 575.

Changes

Explicit group-key aliasing in aggregate plans

Layer / File(s)	Summary
GraphOp::Aggregate schema `crates/gf-ir/src/plan.rs`	Adds `group_aliases: Vec<Option<String>>` with serde defaults and empty-list serialization skipping, and updates the round-trip test to include the new field.
Binder populates group_aliases `crates/gf-ir/src/binder.rs`	`lower_return_aggregate` builds `group_aliases` from each non-aggregate RETURN item's `AS` alias or `display` text, emits it in `GraphOp::Aggregate`, and updates aggregate RETURN tests to match the expanded variant.
Relational lowering and TCK updates `crates/gf-rel/src/lowerer.rs`, `tests/tck/coverage_matrix.json`, `tests/tck/passing_baseline.txt`	`lower_relational_op` passes `group_aliases` into `lower_aggregate`, which applies aliases to group-key expressions; the lowerer test is updated, and the TCK baseline plus passing count are extended.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

DecisionNerd/graphforge#694: Both PRs modify crates/gf-rel/src/lowerer.rs’s GraphOp::Aggregate lowering logic; #694 maps Aggregate { group_by, aggs } to a DataFusion aggregate, while this PR extends that lowering to also consume and apply group_aliases.
DecisionNerd/graphforge#906: This PR’s binder now builds GraphOp::Aggregate.group_aliases from each RETURN item’s alias/display text, which aligns with the earlier return-item naming changes in the retrieved PR.
DecisionNerd/graphforge#909: Both PRs touch crates/gf-ir/src/binder.rs in lower_return_aggregate to change how RETURN item names are derived for aggregate-related output columns.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is concise and accurately reflects the mixed-aggregate group-by naming change.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The description covers the change, impact, and verification clearly, and while it doesn't mirror every template section, it is mostly complete.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/599-aggregation-groupby-naming

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

crates/gf-rel/src/lowerer.rs (1)
2371-2374: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a non-empty group_aliases lowering test.

This test only covers group_aliases: vec![], so the new aliasing branch in lower_aggregate is still unpinned at unit level. A small grouped-aggregate case that asserts the output header uses the supplied alias would make this contract much harder to regress.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gf-rel/src/lowerer.rs` around lines 2371 - 2374, Add a
grouped-aggregate lowering test that exercises a non-empty group_aliases path in
lower_aggregate, since the current coverage only uses vec![] and leaves the
aliasing branch unverified. Create a small aggregate case in the lowerer.rs test
area that passes a group alias through GraphOp::Aggregate and assert the lowered
output header/column name uses that alias, so the behavior is pinned at unit
level.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gf-ir/src/binder.rs`:
- Around line 1089-1095: The group-by lowering in binder.rs still uses
lower_expr for RETURN items, which leaves whole-node keys like n or startNode(r)
as bare VarRef values instead of the _node_struct form. Update the
Aggregate/group_by path in the relevant binder logic to reuse the same
whole-node materialization used by lower_return_item_expr, so mixed queries like
RETURN n, count(*) bind against the rewritten node structure rather than an
unbound column. Keep the existing alias handling via
group_aliases.push(item.alias.clone().or_else(|| item.display.clone()))
unchanged.

---

Nitpick comments:
In `@crates/gf-rel/src/lowerer.rs`:
- Around line 2371-2374: Add a grouped-aggregate lowering test that exercises a
non-empty group_aliases path in lower_aggregate, since the current coverage only
uses vec![] and leaves the aliasing branch unverified. Create a small aggregate
case in the lowerer.rs test area that passes a group alias through
GraphOp::Aggregate and assert the lowered output header/column name uses that
alias, so the behavior is pinned at unit level.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0aef24f7-f2e1-4758-a67e-be6c93af906d

📥 Commits

Reviewing files that changed from the base of the PR and between d4856dc and 033be1d.

⛔ Files ignored due to path filters (2)

CHANGELOG.md is excluded by !**/*.md
docs/reference/tck-compliance.md is excluded by !**/*.md, !**/docs/**

📒 Files selected for processing (5)

crates/gf-ir/src/binder.rs
crates/gf-ir/src/plan.rs
crates/gf-rel/src/lowerer.rs
tests/tck/coverage_matrix.json
tests/tck/passing_baseline.txt

Completes source-text naming for the aggregate path (un-aliased aggregates landed in #909) and fixes whole-node group keys. In a mixed aggregate query like `RETURN n.name, count(*)`, the `n.name` group-by column was named by its lowered expression (`var_0.name`), failing the TCK header-exact comparison. The `Aggregate` IR op now carries an optional per-key output alias — `group_aliases`, parallel to `group_by` and `#[serde(skip_serializing_if = "Vec::is_empty")]` so existing plans/goldens are unchanged — filled by the binder from each RETURN item's `AS` alias or verbatim source text and applied by `lower_aggregate` via `.alias()`. Group keys are also now materialized through the same path as terminal RETURN items: a whole-node group key (`RETURN n, count(*)`) rewrites to `_node_struct` rather than a bare `var_N` (which is not a real column and lowered against an unbound reference). (Thanks CodeRabbit.) Corpus passing 570 → 575 (+5, 0 regressions); baseline re-blessed. No golden churn (the IR aggregation golden has an empty group-by). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread crates/gf-ir/src/binder.rs Outdated

DecisionNerd force-pushed the feature/599-aggregation-groupby-naming branch from 033be1d to 29a85e4 Compare June 24, 2026 07:21

DecisionNerd merged commit 9ffab60 into main Jun 24, 2026
43 checks passed

DecisionNerd deleted the feature/599-aggregation-groupby-naming branch June 24, 2026 07:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ir): name mixed-aggregate group-by keys by source text (#599) — corpus 570→574#910

feat(ir): name mixed-aggregate group-by keys by source text (#599) — corpus 570→574#910
DecisionNerd merged 1 commit into
mainfrom
feature/599-aggregation-groupby-naming

DecisionNerd commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DecisionNerd commented Jun 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Impact

Verification

Name mixed-aggregate group-by keys by source text in the IR aggregate op

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DecisionNerd commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading