Skip to content

feat(ir): name mixed-aggregate group-by keys by source text (#599) — corpus 570→574#910

Merged
DecisionNerd merged 1 commit into
mainfrom
feature/599-aggregation-groupby-naming
Jun 24, 2026
Merged

feat(ir): name mixed-aggregate group-by keys by source text (#599) — corpus 570→574#910
DecisionNerd merged 1 commit into
mainfrom
feature/599-aggregation-groupby-naming

Conversation

@DecisionNerd

@DecisionNerd DecisionNerd commented Jun 24, 2026

Copy link
Copy Markdown
Owner

What

Completes source-text naming for the aggregate path (un-aliased aggregates landed in #909). In a mixed aggregate query like RETURN n.name, count(*), the n.name group-by column was named by its lowered expression (var_0.name), failing the TCK header-exact comparison.

The Aggregate IR op now carries an optional per-key output alias — group_aliases, parallel to group_by and #[serde(skip_serializing_if = "Vec::is_empty")] so existing plans/goldens are unchanged. The binder fills it from each RETURN item's AS alias or verbatim source text; lower_aggregate applies it via .alias() on the group expression.

Impact

Corpus passing 570 → 574 (+4, 0 regressions); baseline re-blessed. Modest yield (mixed aggregate queries are uncommon) but it closes the aggregate-naming gap. No golden churn (the IR aggregation golden has an empty group-by).

Verification

$ cargo test --workspace --no-fail-fast   # ALL GREEN
$ cargo test -p gf-api --test bdd         # 574 passing, 0 regressed
$ cargo clippy --workspace -- -D warnings # clean

Refs #599 (does not close). Next aggregation slice: nested aggregates (count(*)+1) via Aggregate→Project decomposition.

🤖 Generated with Claude Code

Note

Name mixed-aggregate group-by keys by source text in the IR aggregate op

  • Adds group_aliases: Vec<Option<String>> to GraphOp::Aggregate in plan.rs, carrying per-key output aliases alongside group_by keys
  • The binder in binder.rs now captures aliases from AS clauses or display text for each group-by key, and materializes bare node variables to node structs using terminal RETURN item semantics
  • The relational lowerer in lowerer.rs applies these aliases to group-by expressions in the lowered plan, so output columns carry the intended names
  • Increases passing TCK scenarios from 570 to 575, covering count/sum non-null and implicit grouping cases
  • Behavioral Change: group-by keys in aggregate ops are now materialized differently — bare node variables yield node structs instead of node references

Macroscope summarized 29a85e4.

Summary by CodeRabbit

  • New Features

    • Added per-group output alias support for grouped aggregate results, improving readable column names.
    • Enabled implicit grouping with aggregate queries in return clauses.
  • Bug Fixes

    • Improved aggregate plan lowering and JSON round-tripping to correctly carry group key aliases.
    • Ensured aggregate output naming stays consistent from planning through execution.
  • Tests

    • Updated aggregate/return clause coverage and adjusted TCK passing baselines to include null-handling and new grouped aggregation scenarios.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

GraphOp::Aggregate gains a group_aliases field. The binder now derives group-key names from RETURN item aliases or display text and stores them in that field. The relational lowerer applies those names to lowered group-key expressions. TCK passing count increases to 575.

Changes

Explicit group-key aliasing in aggregate plans

Layer / File(s) Summary
GraphOp::Aggregate schema
crates/gf-ir/src/plan.rs
Adds group_aliases: Vec<Option<String>> with serde defaults and empty-list serialization skipping, and updates the round-trip test to include the new field.
Binder populates group_aliases
crates/gf-ir/src/binder.rs
lower_return_aggregate builds group_aliases from each non-aggregate RETURN item's AS alias or display text, emits it in GraphOp::Aggregate, and updates aggregate RETURN tests to match the expanded variant.
Relational lowering and TCK updates
crates/gf-rel/src/lowerer.rs, tests/tck/coverage_matrix.json, tests/tck/passing_baseline.txt
lower_relational_op passes group_aliases into lower_aggregate, which applies aliases to group-key expressions; the lowerer test is updated, and the TCK baseline plus passing count are extended.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • DecisionNerd/graphforge#694: Both PRs modify crates/gf-rel/src/lowerer.rs’s GraphOp::Aggregate lowering logic; #694 maps Aggregate { group_by, aggs } to a DataFusion aggregate, while this PR extends that lowering to also consume and apply group_aliases.
  • DecisionNerd/graphforge#906: This PR’s binder now builds GraphOp::Aggregate.group_aliases from each RETURN item’s alias/display text, which aligns with the earlier return-item naming changes in the retrieved PR.
  • DecisionNerd/graphforge#909: Both PRs touch crates/gf-ir/src/binder.rs in lower_return_aggregate to change how RETURN item names are derived for aggregate-related output columns.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately reflects the mixed-aggregate group-by naming change.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description covers the change, impact, and verification clearly, and while it doesn't mirror every template section, it is mostly complete.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/599-aggregation-groupby-naming

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/gf-rel/src/lowerer.rs (1)

2371-2374: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a non-empty group_aliases lowering test.

This test only covers group_aliases: vec![], so the new aliasing branch in lower_aggregate is still unpinned at unit level. A small grouped-aggregate case that asserts the output header uses the supplied alias would make this contract much harder to regress.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gf-rel/src/lowerer.rs` around lines 2371 - 2374, Add a
grouped-aggregate lowering test that exercises a non-empty group_aliases path in
lower_aggregate, since the current coverage only uses vec![] and leaves the
aliasing branch unverified. Create a small aggregate case in the lowerer.rs test
area that passes a group alias through GraphOp::Aggregate and assert the lowered
output header/column name uses that alias, so the behavior is pinned at unit
level.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gf-ir/src/binder.rs`:
- Around line 1089-1095: The group-by lowering in binder.rs still uses
lower_expr for RETURN items, which leaves whole-node keys like n or startNode(r)
as bare VarRef values instead of the _node_struct form. Update the
Aggregate/group_by path in the relevant binder logic to reuse the same
whole-node materialization used by lower_return_item_expr, so mixed queries like
RETURN n, count(*) bind against the rewritten node structure rather than an
unbound column. Keep the existing alias handling via
group_aliases.push(item.alias.clone().or_else(|| item.display.clone()))
unchanged.

---

Nitpick comments:
In `@crates/gf-rel/src/lowerer.rs`:
- Around line 2371-2374: Add a grouped-aggregate lowering test that exercises a
non-empty group_aliases path in lower_aggregate, since the current coverage only
uses vec![] and leaves the aliasing branch unverified. Create a small aggregate
case in the lowerer.rs test area that passes a group alias through
GraphOp::Aggregate and assert the lowered output header/column name uses that
alias, so the behavior is pinned at unit level.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0aef24f7-f2e1-4758-a67e-be6c93af906d

📥 Commits

Reviewing files that changed from the base of the PR and between d4856dc and 033be1d.

⛔ Files ignored due to path filters (2)
  • CHANGELOG.md is excluded by !**/*.md
  • docs/reference/tck-compliance.md is excluded by !**/*.md, !**/docs/**
📒 Files selected for processing (5)
  • crates/gf-ir/src/binder.rs
  • crates/gf-ir/src/plan.rs
  • crates/gf-rel/src/lowerer.rs
  • tests/tck/coverage_matrix.json
  • tests/tck/passing_baseline.txt

Comment thread crates/gf-ir/src/binder.rs Outdated
Completes source-text naming for the aggregate path (un-aliased aggregates landed
in #909) and fixes whole-node group keys.

In a mixed aggregate query like `RETURN n.name, count(*)`, the `n.name` group-by
column was named by its lowered expression (`var_0.name`), failing the TCK
header-exact comparison. The `Aggregate` IR op now carries an optional per-key
output alias — `group_aliases`, parallel to `group_by` and
`#[serde(skip_serializing_if = "Vec::is_empty")]` so existing plans/goldens are
unchanged — filled by the binder from each RETURN item's `AS` alias or verbatim
source text and applied by `lower_aggregate` via `.alias()`.

Group keys are also now materialized through the same path as terminal RETURN
items: a whole-node group key (`RETURN n, count(*)`) rewrites to `_node_struct`
rather than a bare `var_N` (which is not a real column and lowered against an
unbound reference). (Thanks CodeRabbit.)

Corpus passing 570 → 575 (+5, 0 regressions); baseline re-blessed. No golden churn
(the IR aggregation golden has an empty group-by).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@DecisionNerd DecisionNerd force-pushed the feature/599-aggregation-groupby-naming branch from 033be1d to 29a85e4 Compare June 24, 2026 07:21
@DecisionNerd DecisionNerd merged commit 9ffab60 into main Jun 24, 2026
43 checks passed
@DecisionNerd DecisionNerd deleted the feature/599-aggregation-groupby-naming branch June 24, 2026 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant