Skip to content

AI merge up to PG14#2686

Draft
dimoffon wants to merge 2379 commits into
adb-8.xfrom
claude-merge-2
Draft

AI merge up to PG14#2686
dimoffon wants to merge 2379 commits into
adb-8.xfrom
claude-merge-2

Conversation

@dimoffon

@dimoffon dimoffon commented Jun 11, 2026

Copy link
Copy Markdown
Member

Merge PostgreSQL 14 into Greengage

Summary

Brings the upstream PostgreSQL 14 commit range d259afa736..e1c1c30f635 into the
Greengage MPP fork and carries it to a green regression matrix. The raw merge
(8597e91cda3) resolved ~700 conflict files; 356 follow-up commits then take the
tree through every bring-up phase — compile → mock unit tests → initdb/bootstrap →
the JIT × {ORCA, Postgres-planner} regression matrix — fixing each distinct class of
breakage, and add new regression coverage plus reusable tooling. Scale layered on the
raw merge: 636 files, +58k/−39k.

1. Merge & conflict resolution

Resolved semantically — adopt the upstream API shape, then re-graft the GGDB/MPP logic:

  • PGXACT elimination → dense PROC_HDR arrays; GGDB distributed-snapshot and
    reader/writer XID sharing re-grafted onto ProcGlobal->xids[pgxactoff].
  • relkindobjtype enum in CTAS / RefreshMatView / IntoClause.
  • copy.c split (copyfrom/copyto) — kept GGDB's monolithic copy.c, re-grafted
    only the protocol change.
  • Grammar: PG14 bare_label_keyword/BareColLabel superseding GGDB's ColLabelNoAs.
  • Catalog/genbki tightening (oid_symbol rejection, per-catalog DECLAREs, OID-range
    collisions), errstart/errstart_cold, long-lived WaitEventSet, GlobalVis*
    horizon API, bulk pg_attribute insert + attcompression.

2. Bring-up by phase

  • Compile/link — mechanical API-shape fixes.
  • Mock unit tests (make unittest-check) — errstart/errstart_cold mock split,
    new-GUC coverage lists, mock.mk link order; the mock suites pass.
  • initdb/bootstrap — catalog/BKI/planner regressions neither the compiler nor the
    mocks catch: BKI single-quote convention, genbki PGUID substitution, catalog
    header order, missing index DECLAREs, pg_proc.dat last-wins duplicates, row-identity
    wiring for UPDATE/DELETE.
  • Regression — answer-file reconciliation + the fixes below.

3. MPP/ORCA bug fixes (the dominant class)

PG14 added fields/types that ORCA's translator silently omitted or mishandled, causing
crashes or wrong results only under optimizer=on. Fixed:

  • UPDATE ModifyTable.updateColnosLists (segment SIGSEGV) + partitioned U/D fallback
  • Aggref.aggno/aggtransno — every aggregate returned the first one's value
  • SubscriptingRef.refrestype — "cache lookup failed for type 0"
  • GROUP BY DISTINCT ignored (37 vs 9 grouping-set rows) → planner fallback
  • range_agg anymultirange unresolved ("type 4537 is not a multirange type") → fallback
  • correlated EXISTS/scalar subplan in the target list (cdbllize crash + lost correlation)
  • CTAS/matview DISTRIBUTED BY (<aggregate>) (Motion-on-Motion, MIN/MAX planagg guard)
  • multi-DQA + FILTER, split-update target list, binary-dispatch JoinExpr join_using_alias
  • matview REFRESH … CONCURRENTLY (prefixed transient heap, whole-row .* refs)
  • COPY/sequence wire protocol (pq_getmessage maxlen), nested InitPlan across a Motion
  • HA: coordinator SyncRep QD exemption, pg_rewind segment gp_dbid preservation
  • assert-build crashers (syncrep ×2, AO-update slot, over-strict ORCA asserts, …)

4. Test reconciliation & the CI matrix

CI runs the same expected/*.out across four jobs — {JIT, non-JIT} × {ORCA, Postgres
planner}. Principles applied (and captured in the tooling): regenerate answer files only
from the failing job's CI result (local runs bake env-specific values like db name /
segment count); a shared base .out is used by ORCA too, so regenerating it to the
Postgres-planner output breaks ORCA; JIT-only EXPLAIN output (the Settings: line) must
not be baked into a file a non-JIT job compares. Representative fixes: explain
stabilized via SET jit = off + boot-default cost GUCs; table_functions/direct_dispatch
matchsubs; _optimizer.out splits for shared optimizer-dependent tests.

5. New regression coverage

  • gp_pg14_merge_regress — lock-in test: one minimal reproducer per MPP/ORCA bug
    class above, so a revert re-introduces the crash/wrong-result. Runs under both optimizers.
  • gp_pg14_features — MPP/ORCA coverage for under-tested PG14 features (GROUP BY
    DISTINCT, recursive-CTE SEARCH/CYCLE, multirange, subscripting in SELECT/WHERE/JOIN,
    extract/date_bin, range_agg, NOT MATERIALIZED). This test caught the two new
    ORCA bugs
    (GROUP BY DISTINCT and range_agg), both fixed in this PR.

6. Tooling / docs

Seven reusable Claude Code skills under .claude/skills/ distilled from the campaign
(build, regress-tests, answer-file-regen, cluster-ops, debug, internals, pg-merge).

Verification

  • Compile + mock unit-test suites green.
  • JIT × ORCA and JIT × Postgres-optimizer regression jobs green; the deterministic
    opt=off/ORCA long tail is closed.
  • Both new tests pass under optimizer=off and optimizer=on.

Known remaining (not regressions)

  • isolation2 HA/fault-injection specs (fts_session_reset, segwalrep/*,
    pg_rewind_*) fail only in CI because the small gpdemo runs out of memory
    during the full ~250-spec schedule — OOM events time-correlate with the failures and
    all specs pass locally in isolation. CI cluster-capacity, not code.
  • unnest(anymultirange) was not carried into the branch (no C multirange_unnest /
    catalog entry) — the new test covers multirange via the constituent-range operators;
    porting it is a candidate follow-up.

dimoffon and others added 30 commits May 26, 2026 14:44
…ng dup

- Remove all PG13 i_rolname, rtypacl, inittypacl, initrtypacl, relacl,
  rrelacl, initrelacl, initrrelacl field assignments (replaced by PG14 pattern)
- Remove duplicate getopt_long call (PG13 option string)
- Remove orphaned PG14 args from binary_upgrade_set_type_oids_by_rel decl
- Fix binary_upgrade_set_type_oids_by_type_oid body: remove tyinfo refs,
  use pg_type_oid directly, simplify array type OID handling
- Fix binary_upgrade_set_type_oids_by_rel: use tblinfo->dobj.catId.oid

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The #if 0 block contained an unbalanced brace that the compiler
skipped, causing a depth-1 nesting that made all subsequent functions
appear nested.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ts section

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ard decls

- pg_dump.c: add missing for-loop close + PQclear after #if 0 block
- pg_dump.c: remove PG13 collectSecLabels/dumpDumpableObject forward decls
- pg_dump.c: remove PG13 collectSecLabels() call (PG14 is on-demand)
- pg_dump.c: fix getopt_long duplicate
- common.c: remove PG13 binary search in findObjectByCatalogId (PG14 uses hash)
- common.c: add forward declarations for buildIndexArray/DOCatalogIdCompare
- common.c: remove duplicate findPublicationByOid

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…g_dump

The #if 0 block was causing brace confusion between preprocessor and
compiler. Remove it completely and keep just the working code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…cific code)

The merged version had deeply corrupted brace structure from interleaved
PG13/PG14 attrdef handling. Cloudberry's version has clean PG14 code
without any cloudberry-specific additions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t_function merge

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ffer stubs

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…andler

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…clitems

- Fix broken sed-escaped simple_prompt calls in scripts/common.c
- Replace fe_utils/connect_utils.h include with GPDB-specific declarations
  in common.h (our connectDatabase has different PG13-era signature)
- Fix dumputils.c: raclitems -> revokeitems (variable name mismatch)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Remove PG13 if/else connect block from CloneArchive (PG14 connects above)
- Fix createdb.c to use GPDB connectMaintenanceDatabase individual-arg API

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PG14 scripts use ConnParams struct but GPDB connectMaintenanceDatabase
uses individual args. Add ConnParams typedef to common.h and a wrapper
function that unpacks ConnParams into individual args.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Some scripts pass ConnParams by pointer (function params), others by
value (local vars). Remove & when already a pointer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…pers

Take cloudberry scripts/common.c and common.h which fully adopt PG14's
fe_utils/connect_utils.h ConnParams-based connection API.
Revert all connectDatabase_cparams/connectMaintenanceDatabase_cparams
wrapper calls back to standard PG14 connectDatabase/connectMaintenanceDatabase.
Restore ConnParams usage in createdb.c.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…arrayelem, fix exec_prepare_plan

- Remove PG13 duplicate query/plan members from PLpgSQL_expr
- Restore PLpgSQL_arrayelem struct (GPDB-specific, lost in merge)
- Remove extra keepplan arg from exec_prepare_plan calls (PG14 removed it)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…_single_test args

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…oesn't have it)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…id_le)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…eeds uuid_le)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…(PG14)

PG14 split errstart into errstart (warm) and errstart_cold (cold path
for ERROR and above). Update EXPECT_EREPORT mock to expect errstart_cold
when LOG_LEVEL >= ERROR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PG14 uses errstart_cold for elevel >= ERROR. Update EXPECT_EREPORT
macros in all test files to set expectations on errstart_cold instead
of errstart when testing ERROR/FATAL paths.

Files fixed: dfmgr_test.c, session_state_test.c, postinit_test.c,
runaway_cleaner_test.c, gp_replication_test.c

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dimoffon and others added 30 commits June 16, 2026 13:35
…uild)

extract_nodes_expression() asserted its node argument non-NULL, but its
walker -- like the sibling extract_nodes() -- already treats NULL as "no
nodes to extract" and returns an empty list. ORCA's Agg translation
(CTranslatorDXLToPlStmt) extracts T_Aggref from both a plan node's
targetlist and its qual to densely number aggno/aggtransno (PG14). The
qual is frequently NIL (e.g. SELECT count(*) with no HAVING), so the
assert crashed the QD with FailedAssertion("node", walkers.c:731) on bare
aggregates.

This stayed invisible because the assert campaign builds --disable-orca
and the ORCA jobs build without asserts; the assert-and-ORCA intersection
(the "jit tests with orca" job) was never exercised. Drop the over-strict
assert to match extract_nodes().

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…PG14)

PG14 removed postfix operators, so the `busy` view's WHERE clause
`0 = (t1.c1 % 2 + 10000)!` is now a syntax error. CREATE VIEW busy failed
and every later step cascaded: GRANT/SELECT/REVOKE/DROP on the missing
view all errored, and the cpu_usage>50 check returned f because no busy
workload ever ran. Replace the postfix factorial operator with the
retained factorial() function in both the input and expected-output
.source files. Same treatment as 36f871f gave gpcheckcat.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…wers

These ORCA (optimizer=on) answer files embed the elog source location in
the error text. createplan.c and setrefs.c shifted (now :8633 and :3380),
so the committed expected files were stale (:8615, :3375 x12). The "jit
tests with orca" job only reached these tests after fa65034 stopped the
count(*) assert crash from aborting the schedule early. Error behavior is
unchanged -- only the (file:line) annotations are updated to match source.

The explain_optimizer.out column-width drift is left for the regen pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t heap

REFRESH MATERIALIZED VIEW CONCURRENTLY could fail with

  ERROR:  column _$newdata.type does not exist
  HINT:  Perhaps you meant to reference the column "_$newdata._$type".

while building the diff table in refresh_by_match_merge().

GPDB creates the transient "newdata" heap with make_new_heap_with_colname(...,
"_$"), which prefixes every column name with "_$" to avoid collisions with the
matview's own column names / query aliases (commit c654c50). The equality
quals of the FULL JOIN must therefore reference the transient heap's columns by
their prefixed names. The original fix did exactly that by emitting the left
(newdata) operand from the transient heap's descriptor (newattr) and only the
right (mv) operand from the matview's descriptor (attr).

The PG14 merge re-grafted this block onto upstream's reshaped code and took
upstream's operand expression for both sides, i.e. attr->attname, leaving
newattr declared but unused. That regressed the left operand to the unprefixed
matview name ("_$newdata".type) which does not exist on the transient heap
(its column is "_$type").

The bug is masked whenever the matview's relcache descriptor is still carrying
the in-place "_$"-prefixed names that make_new_heap_with_colname() left behind
(then attr->attname is also "_$type", so both operands happen to agree). It
surfaces once an intervening relcache invalidation reloads the matview to its
real names, which is why it only reproduced under some build/run timings (the
ORCA CI job) and not others.

Restore the left operand to newattr->attname so it always matches the transient
heap's actual (prefixed) column name. Verified by forcing the matview relcache
reload locally: the failing CREATE TEMP TABLE ... diff query now resolves and
REFRESH ... CONCURRENTLY succeeds; matview_ao passes under optimizer=on.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Under the Postgres planner, a query with multiple DISTINCT-qualified
aggregates where at least one carries a FILTER, e.g.

  SELECT sum(DISTINCT a) FILTER (WHERE a > 0),
         sum(DISTINCT b) FILTER (WHERE a > 0) FROM t;

failed at planning with

  ERROR:  variable not found in subplan target list (setrefs.c:3380)

(or "could not find pathkey item to sort" with a GROUP BY).

A DISTINCT aggregate's FILTER is enforced down in the TupleSplit node via
DQAExpr.agg_filter; fetch_multi_dqas_info() moves it there and clears
aggfilter so the Agg stages above never re-evaluate it.  It cleared the
filter only on the cost-list Aggref copies (agg_partial_costs /
agg_final_costs ->distinctAggrefs).  That was enough when those were the
same node objects the plan targetlists referenced, but in PG14
make_partial_grouping_target() builds separate flat-copies of the Aggrefs,
so the partial- and final-stage targetlists still carried the filter.
setrefs then tried to resolve the raw filter columns against a subplan
(the TupleSplit output) that no longer exposes them, hence the error.

This is the same flat-copy hazard already handled for agg_expr_id in
3cab355; that fix only covered the partial target and only propagated
the id.  Generalize it: clear aggfilter on the DISTINCT Aggrefs of BOTH the
partial and final plan targets (and keep stamping agg_expr_id on the
partial stage), matched by aggno.  copy_pathtarget() only shallow-copies
the expr list, so work on private copyObject()'d Aggrefs to avoid mutating
the nodes other candidate paths for this rel still share.

Verified: the failing forms now return correct results under optimizer=off
(and optimizer=on); gp_dqa no longer errors.  The executor side of the same
feature is fixed in the following commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ExecTupleSplit() emits one tuple per DISTINCT-qualified aggregate for each
input row, walking currentExprId across the DQAs.  When a DQA carries a
FILTER that rejects the current row it sets filter_out and advances
currentExprId so the do/while retries the next DQA.  But filter_out was
assigned only inside the `if (filter)` branch: a DQA with no FILTER
(agg_filter_array entry NULL) fell through, leaving filter_out at whatever
the previous (filtered-out) DQA set it to, and without advancing
currentExprId.  With a mix of filtered and unfiltered DQAs, e.g.

  SELECT sum(DISTINCT a) FILTER (WHERE a > 0), sum(DISTINCT b) FROM t;

any row failing the first DQA's filter then spun forever on the unfiltered
DQA -- an uninterruptible hang once the split feeds a Motion (statement
timeout cannot cancel a process blocked in motion IPC).

A DQA without a FILTER never filters its tuple out, so clear filter_out in
the else branch.  Latent since the feature landed (2020); only reachable
now that multi-DQA FILTER planning works again (preceding commit).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
create_motion_plan() built the Motion node directly over the plan produced
for its subpath.  When that subpath itself plans to a Motion, make_motion()
trips Assert(!IsA(lefttree, Motion)) -- a Motion may not sit directly on a
Motion.

This arises when a path that already ends in a Motion is redistributed
again, e.g.

  CREATE TABLE t AS SELECT sum(amt) AS s FROM src DISTRIBUTED BY (s);

The single-row aggregate result is produced behind a gather/redistribute
Motion, and the new table's distribution policy then asks to redistribute it
by s, stacking a second Motion on the first.

Interpose a pass-through Result between the two Motions so they occupy
adjacent slices; setrefs rewires the Result's targetlist to reference the
lower Motion's output via OUTER_VAR.  Verified: such CTAS / CREATE
MATERIALIZED VIEW statements now build and return correct, correctly
distributed results.  (This only completes the fix together with the
distribution-key Aggref matching in the following commit; on its own the
aggregate case still stops earlier at the createplan.c distkey lookup.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CREATE TABLE ... AS SELECT <aggregate> ... DISTRIBUTED BY (<aggregate>) (and
the matview equivalent) failed under the Postgres planner with

  ERROR:  could not find hash distribution key expressions in target list
          (createplan.c:8633)

create_motion_plan() -> cdbpathlocus_get_distkey_exprs() ->
cdbpullup_findEclassInTargetList() matches each distribution-key equivalence
member against the subplan targetlist with equal().  PG14 (dfd85ea) added
aggno/aggtransno to Aggref and _equalAggref() compares them; preprocess_
aggrefs() stamps them on the query's targetlist aggregates, but the
distribution-key expression is an un-numbered copy (aggno = -1).  equal()
therefore rejects two otherwise-identical aggregates purely on these physical
execution-slot numbers, so the hash key is "not found".  Non-aggregate
computed keys (e.g. DISTRIBUTED BY (amt*2)) were unaffected, making the
failure aggregate-specific.

aggno/aggtransno carry no semantic meaning for this match, so on the fallback
path normalize them to -1 on both sides before comparing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
REFRESH MATERIALIZED VIEW CONCURRENTLY failed with

  ERROR:  operator does not exist: text pg_catalog.*= text

when the materialized view has a column whose name collides with one of the
refresh_by_match_merge() table aliases (newdata, newdata2, mv, diff) -- e.g.
the matview built from a table with a "newdata" column, which the regression
test matview.sql (mvtest_mv_foo) exercises on purpose.

Commit c654c50 hardened these queries two ways: it created the transient
"newdata" table with a "_$" column prefix, AND wrote the whole-row references
as "alias.*" rather than a bare "alias" so an alias can never be mistaken for
a same-named column.  The PG14 merge re-grafted the queries onto upstream's
shape, which renamed the aliases to "_$newdata"/"_$mv"/... and dropped the
".*".  That reintroduced the conflict: the transient table's prefixed column
"_$newdata" now matches the alias "_$newdata", so a bare "_$newdata" binds to
the (text) column instead of the whole row, and "_$newdata2 *= _$newdata"
looks for a nonexistent text *= text operator.

Restore the "alias.*" form on the whole-row references in both the
duplicate-check query and the diff FULL JOIN query, so they unambiguously
expand the range-table row (record), matching the record *= record operator.
Verified: REFRESH ... CONCURRENTLY succeeds on a matview with colliding
column names and on the ordinary case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CREATE TABLE / MATERIALIZED VIEW AS SELECT min(x)/max(x) ... DISTRIBUTED BY
(<that aggregate>) crashed under the Postgres planner with

  FailedAssertion("subroot->eq_classes == NIL", planagg.c)

build_minmaxagg_path() implements the MIN/MAX indexscan rewrite by
shallow-copying the PlannerInfo and running query_planner() on the clone.
That only works before query_planner() has built any equivalence classes,
join info or placeholders -- the clone would otherwise share, and the
subquery planning corrupt, root's lists; preprocess_minmax_aggregates() is
documented to run at exactly that early point and the clone asserts those
lists are empty.

In GPDB, a CREATE ... AS ... DISTRIBUTED BY (<col>) records the result
distribution as an equivalence class before grouping_planner reaches the
MIN/MAX preprocessing, so root->eq_classes is already non-empty and the
assert fires (a hard crash in assert builds; list corruption otherwise).

The rewrite is only an optimization, so add it to the existing reject list:
bail out when eq_classes/join_info_list/placeholder_list are already
populated, falling back to regular aggregation.  Verified: such CTAS/matview
statements now build correct results, and ordinary MIN/MAX queries (e.g.
min(unique1) FROM tenk1) still get the Result/InitPlan/Limit/Index Only Scan
optimization under optimizer=off.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14 (e9c91a0c5)'s psql shows a "Compression" column in \d+ output for
tables and materialized views.  Regenerate the materialized-view describe
blocks in matview.out to include it.  Cosmetic, optimizer=off path only
(matview_optimizer.out already matches under ORCA).

With the matview correctness fixes earlier in this branch (distribution-key
Aggref matching, the Motion-on-Motion Result interpose, the REFRESH ...
CONCURRENTLY .* whole-row references, and the MIN/MAX distribute-by guard),
the optimizer=off matview regression test now passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14 (2f70fdb) removed the long-deprecated geometric "contained in"/
"contains" operators @ and ~; create_index.sql still used them, so under the
PG14 build those queries failed with "operator does not exist: box @ box" /
"polygon ~ polygon".  Replace them with the modern spellings upstream adopted
(@ -> <@, ~ -> @>); the GiST opclasses support these and the results are
unchanged (verified: home_base <@ box and f1 @> polygon return the same rows).

Regenerate the optimizer=off create_index.out for that rename plus the other
already-present PG14/GPDB cosmetic drift it was failing on: point_tbl's new
(Infinity,1e+300) row, the \d+ Compression column, merge/sort-key plan
shapes, and the REINDEX/CREATE INDEX CONCURRENTLY messages (the branch
downgrades REINDEX CONCURRENTLY to a non-concurrent reindex with a NOTICE).
Verified green via a fresh full-setup run (test_setup..create_index).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Regenerate the base (optimizer=off / Postgres planner) expected output for 38
tests whose only differences are cosmetic PG14/GPDB drift that the merge never
reconciled (these upstream/core tests aren't in greenplum_schedule, so the
campaign's optimizer=on runs never exercised them):

  - psql \d+ now prints a Compression column (e9c91a0c5)
  - AT TIME ZONE deparses as "(x AT TIME ZONE z)" not "timezone(z,x)"
  - PG14 grammar/error-text: CREATE STATISTICS on expressions, EXTRACT as its
    own function, reworded "column definition list" messages, etc.
  - MPP plan shapes (Gather/Redistribute Motion, Partial/Finalize Aggregate,
    merge/sort keys) and slice renumbering
  - new upstream test queries (range_intersect_agg, added analyze steps, ...)

Verified there are no success->error transitions in any regenerated file (the
gate that previously surfaced the real gp_dqa/matview bugs), and spot-checked
a deterministic sample (stats_ext, domain, gporca, decode_expr, as_alias,
create_function_3) green under optimizer=off.  DISTRIBUTED-BY NOTICE/HINT and
"Distributed by:" lines (ignored by gpdiff via init_file) were stripped to
keep the diffs to real content.

Excluded, needing separate handling: explain / truncate_gp (run-to-run flaky:
masked memory width, AO segfile stats); join / portals / subselect (genuine
optimizer=off behavior changes -- could-not-devise-plan, backward-scan/cursor
-- that need per-test judgement, not blind regen).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Follow-up to the optimizer=off baseline regen.  These three were held back
from that pass because a raw git-diff scan flagged apparent success->error
transitions, but investigation shows they are not bugs:

  - The flagged errors all sit inside --start_ignore/--end_ignore blocks
    (gpdiff drops them) that deliberately document GPDB limitations the tests
    already know about: a LATERAL join that yields "could not devise a query
    plan" (join.sql carries an explicit "-- FAIL with ERROR" comment),
    "backward scan is not supported", and SCROLL/"cursor can only scan
    forward".  Several others are error->error stale line-number drift (e.g.
    could-not-devise pathnode.c:485 -> :275), already present in the expected
    output.
  - The real gpdiff differences are cosmetic optimizer=off plan shapes
    (Index Scan vs Seq Scan, Merge Join vs Hash Join, Motion shapes).

The large git diffs are row-reordering of unordered result sets that gpdiff
sorts away (no GP_IGNORE-prefixed changes were touched).  Verified: join and
portals are green on re-run with full setup; subselect's only re-run delta is
a "subselect_tbl already exists" artifact of re-running into an existing db.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add reusable Claude Code skills under .claude/skills/ distilled from the
PG14 -> Greengage/GPDB merge, fix and regression-reconciliation work:

  greengage-build            build/install in the container; the stale-binary trap
  greengage-regress-tests    pg_regress/isolation2; optimizer on/off; setup deps
  greengage-answer-file-regen cosmetic regen + the success->error safety gate
  greengage-cluster-ops      gpdemo health, disk/progress monitoring, gprecoverseg
  greengage-debug            log/crash analysis, repro, instrumentation, gdb
  greengage-internals        MPP planner/executor, merge re-graft methodology,
                             the PG14 Aggref aggno/aggtransno bug class

Repoint CLAUDE.md's merge-workflow section at greengage-internals (the old
GG_PG_MERGE_SKILL.md reference was dangling).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Refer to the fork as GGDB (Greengage) throughout the new skills and the
CLAUDE.md fork-referring uses. Kept the one "Greenplum Database (GPDB)" on
line 7 since there GPDB abbreviates the upstream (Greenplum), not the fork.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
My prior cosmetic optimizer=off regen (6b6e7bf) rewrote several SHARED
expected files (no _optimizer variant) to the Postgres-planner output, which
broke the ORCA job because optimizer=on falls back to the same base .out.
Restore the shared base files to their committed (both-optimizer) output:
  as_alias, create_function_3, decode_expr, select_into, subselect, timestamptz
(this also re-greens as_alias/decode_expr under optimizer=off — their CI output
is optimizer-agnostic and matches the restored base).

Regenerate the three _optimizer expected files from the authoritative ORCA CI
results, capturing genuine improvements/changes my code fixes produced:
  - matview_optimizer.out: the distribute-by-aggregate matview now SUCCEEDS under
    ORCA (createplan Result-interpose + cdbpullup aggno + planagg guard fixed the
    shared path), so the old createplan.c "could not find hash distribution key"
    error is gone, not merely line-shifted.
  - gp_dqa_optimizer.out: the nodeTupleSplit filter-reset fix corrected multi-DQA
    + FILTER results under ORCA too (error rows -> correct data).
  - create_index_optimizer.out: PG14 geometric-operator rename (@->@<, ~->@>).

Safety gate (gpdiff-aware success->error scan) returns 0 across all nine files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ate local regen)

The earlier cosmetic optimizer=off regen (6b6e7bf) captured local-environment-
specific output instead of the canonical CI output, so the JIT/Postgres-optimizer
job failed on:
  ao_locks              "1 segment" baked in instead of the normalized "n segments"
  subselect             a fresh db named 'jps2' baked into current_database()-folded
                        literals instead of 'regression'
  qp_orca_fallback      a stale plan shape (the Result-interpose now appears)
  qp_misc_jiras         local row-ordering + "1 segment" contamination
  qp_targeted_dispatch  same class of local contamination

Regenerate each base .out from the authoritative optimizer=off CI results. All five
have an _optimizer.out variant, so the ORCA path is unaffected. Verified:
  - qp_misc_jiras opt=off data-row SET is identical to qp_misc_jiras_optimizer.out
    (1993 rows, 0 set difference) — only plan shape / ordering differ, data is correct.
  - the two safety-gate ERROR hits (gp_configuration / backward-scan) sit inside
    -- start_ignore/-- end_ignore blocks (gpdiff-ignored), and were actually DROPPED
    by the earlier bad regen — this restores them.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…changes

table_functions is a .source test; its base output/table_functions.source (the
optimizer=off expected) was left stale by the merge while the ORCA variant
(table_functions_optimizer.source) was already updated, so only the
JIT/Postgres-optimizer job failed it. Re-graft the upstream PG14 cosmetic changes
into the base:
  - "a column definition list is only allowed for functions returning \"record\""
    -> "... is redundant for a function with OUT parameters" (x2)
  - EXTRACT now reports column label / error as extract() rather than date_part()
    ("function pg_catalog.date_part(unknown, integer)" -> "...extract(...)").

The Unique/Sort/HashAggregate plan-shape drift is intentionally absorbed by the
file's own SORT_OR_HASH matchsub (lines 1-4), so no plan edit is needed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
explain failed ONLY in the JIT CI jobs (JIT+ORCA, JIT+Postgres) while passing the
non-JIT jobs. Cause: the JIT jobs enable jit and lower the jit cost thresholds, so
GPDB's EXPLAIN (VERBOSE) emits a wide ` Settings: jit = 'on', jit_above_cost = 'N',
optimizer_jit_above_cost = 'N'` line. init_file ignores the line's CONTENT
(m/^ Settings:.*/), but its WIDTH still sets the explain_filter output column, which
shifts the column header/separator padding that atmsort does NOT normalize -> diff.
Baking the jit width would break the passing non-jit jobs, so a regen alone can't fix it.

Fix: pin the three jit GUCs to their boot defaults at the top of explain.sql
(jit=off, jit_above_cost=100000, optimizer_jit_above_cost=7500). EXPLAIN (SETTINGS)
only reports GUCs modified from their built-in default, so the jit entries drop out
and the Settings line becomes identical (` Settings: optimizer = 'off'` under
optimizer=off, absent under ORCA) across the jit and non-jit jobs. This matches the
test's own intent -- the JSON cases already strip jit "as they vary in test environment".

Regenerated both expected files from a faithful local run (int8_tbl + tenk1, the only
tables explain uses; plans are stats-insensitive). The base explain.out was also stale
and now picks up upstream drift it had missed (new explain_filter definition, the
"Async Capable" EXPLAIN field, the extra (buffers, format ...) queries). Verified: both
optimizer=off and optimizer=on re-run clean (0 regression.diffs); no local-env
contamination (db name / paths / hostname); success->error gate = 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-regen

Update greengage-answer-file-regen with the hard-won rules from the JIT/ORCA CI
triage that the original draft got wrong or missed:
  - Regenerate from the failing job's CI RESULT TARBALL, not a local gpdemo run
    (local bakes 'jps2' db names, "1 segment", local row-order that CI rejects);
    local regen is only safe for fully-normalized tests like explain.
  - A shared base <t>.out with no <t>_optimizer.out is used by ORCA too -> a
    Postgres-planner regen of it breaks the ORCA jobs.
  - Don't bake JIT-only EXPLAIN output into a file a non-JIT job compares.
  - explain is NOT unfixable-flaky: pin the three jit GUCs to their boot defaults
    in explain.sql so EXPLAIN(SETTINGS) drops them, then regen (done 23db61b).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mestamptz

These three SHARED base files (no _optimizer.out) pass ORCA but fail optimizer=off
after b4275d4 reverted them to the ORCA-correct version (the shared-file
dilemma: a single base can't satisfy both planners). Split them: keep the
ORCA-passing output as <t>_optimizer.out, and regenerate the base <t>.out from the
optimizer=off CI result so each planner path gets its own answer. a13 (ORCA)
confirms the _optimizer side; success->error gate = 0 on the bases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two remaining optimizer=off diffs after e3c9618:
  - PG14 plans the DISTINCT in two multiset_5 trees as Sort-over-HashAggregate
    instead of Unique-over-Sort. atmsort canonicalizes the text EXPLAIN to node
    types only (costs/keys stripped) and applies the SORT_OR_HASH matchsub to the
    RAW text BEFORE canonicalizing, so the matchsub never fires -> update the node
    types in the base source (Unique->Sort, child Sort->HashAggregate).
  - e3c9618 dropped the trailing space on the " extract " column header
    (atmsort does NOT normalize it); restore it.

Verified locally: the plan-shape and extract diffs are gone; the only residual
diff is an environment-specific pg_am row (heap2) absent from the local cluster
but present on CI (a14's diff never included it).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rollover)

direct_dispatch's `explain (costs off) insert ... values(now())` constant-folds
now()::date into the plan's `Hash Key: 'MM-DD-YYYY'::date` for direct dispatch, so
the optimizer=off base answer was stamped with whatever day it was regenerated
(06-16-2026) and fails every CI run on a later date. Add a date matchsub (the same
mechanism table_functions uses for SORT_OR_HASH) to the .sql and both expected
files so the date is normalized on both sides. Only the base plan shows the date;
the ORCA plan does not, but the echoed matchsub block must be present in the
_optimizer expected too.

Verified locally: optimizer=off and optimizer=on both pass (committed 06-16 date vs
today's 06-17 result, both normalized to MM-DD-YYYY).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…P/ORCA fixes

New GGDB regression test guarding the MPP/ORCA-path bugs fixed during the
PostgreSQL 14 -> Greengage merge. Each section is a minimal reproducer that
crashes, hits a dispatch deserialize error, or returns wrong data on the
pre-fix code:
  1. distributed + partitioned UPDATE/DELETE (ORCA updateColnosLists SIGSEGV,
     partitioned U/D planner fallback)
  2. several different aggregates in one query (ORCA Aggref aggno/aggtransno --
     bug returned the first aggregate's value for all)
  3. correlated EXISTS + correlated scalar subquery in the SELECT list
     (AlternativeSubPlan cdbllize crash + lost correlation)
  4. CTAS / matview DISTRIBUTED BY (<aggregate>) (Motion-on-Motion Result
     interpose + MIN/MAX planagg eq_classes guard) and a concurrent matview
     refresh over a text column (refresh_by_match_merge alias.* / _$ / text=text)
  5. CREATE VIEW + CTAS over JOIN USING (binary-dispatch JoinExpr join_using_alias)
  6. jsonb/array subscript assignment (ORCA SubscriptingRef.refrestype)

Data-only (no EXPLAIN) so the single expected file is stable across the
optimizer x JIT CI matrix; ORCA output matches the base, so no _optimizer.out is
needed. Verified locally: passes under optimizer=off and optimizer=on (0 diffs).
Registered standalone in greenplum_schedule to avoid parallel-group contention
from its partition DDL + concurrent matview refresh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14's GROUP BY DISTINCT (Query.groupDistinct) deduplicates the grouping sets
generated by overlapping ROLLUP/CUBE/GROUPING SETS. ORCA's query translator never
looked at the flag, so it silently emitted the full, duplicated set of grouping
sets -- e.g. `GROUP BY DISTINCT ROLLUP(a,b), ROLLUP(a,b)` returned 37 grouping-set
rows under optimizer=on versus the correct 9 under the Postgres planner.

ORCA does not implement grouping-set dedup, so raise the standard
Query2DXLUnsupportedFeature in CheckUnsupportedNodeTypes when groupDistinct is set;
the query then falls back to the Postgres planner, which handles it correctly.
GROUP BY DISTINCT had no regression coverage, so no existing test relied on the
buggy behavior.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New GGDB test exercising PG14 features that arrived in the merge with little or no
regression coverage, over distributed data and under both optimizers:
  - GROUP BY DISTINCT dedup of overlapping grouping sets (guards the ORCA fallback
    just added; ORCA previously returned the un-deduped 37 rows instead of 9)
  - recursive CTE SEARCH DEPTH FIRST and CYCLE detection over a distributed graph
  - multirange types: containment (@>), overlap (&&), bounds (lower/upper) and
    intersection (*), distributed

Data-only and ORDER BY-stable so a single expected file covers the optimizer x JIT
matrix. Verified under optimizer=off and optimizer=on (0 diffs). Notes a gap found
while writing it: unnest(anymultirange) (C multirange_unnest + its pg_proc entry)
was not carried into this branch, so the multirange section uses the constituent-
range operators instead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ckfill)

Document the P2 finding: adding an _optimizer.out to a test that already passes
under ORCA against the shared base asserts nothing new (verified for
delete/insert_conflict/with -- their ORCA output either matches the base byte-for-
byte, is gpdiff-normalized noise, or has its plans in --start_ignore blocks). The
real ORCA coverage gaps are untested features (found by exercising them under both
optimizers), not missing _optimizer.out files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…G14)

range_agg(anyrange) returns the PG14 anymultirange polymorphic type. ORCA does not
resolve it to the concrete multirange type, so the anymultirange pseudo-type reaches
execution and the aggregate's finalfn errors with "type 4537 is not a multirange
type" (multirangetypes.c:550) under optimizer=on, while the Postgres planner returns
the correct multirange.

ORCA doesn't implement this resolution, so detect any Aggref whose result type is a
multirange in CheckUnsupportedNodeTypes and raise the standard unsupported-feature
fallback to the planner. Adds a gpdb::IsMultirangeType wrapper (type_is_multirange).
range_intersect_agg(anyrange)->anyrange is unaffected (no multirange result) and
keeps using ORCA.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ORCA coverage

Add coverage for more PG14 features over MPP, run under both optimizers, found by
probing for ORCA divergences:
  - subscripting (SubscriptingRef) in the SELECT list, WHERE, and a join key
    (the refrestype class beyond the UPDATE path) -- ORCA handles these correctly
  - extract() PG14 numeric return + date_bin() over distributed timestamptz
    (timezone/datestyle pinned for portable rendering)
  - range_agg() -> anymultirange, which now falls back to the planner under ORCA
    (locks in a235ac4) and returns the correct multirange
  - WITH ... NOT MATERIALIZED inlining over distributed CTEs

ORCA output matches the base for all of them (correct native handling or fallback),
so a single expected file suffices. Verified under optimizer=off and optimizer=on
(0 diffs).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant