AI merge up to PG14#2686
Draft
dimoffon wants to merge 2379 commits into
Draft
Conversation
…ng dup - Remove all PG13 i_rolname, rtypacl, inittypacl, initrtypacl, relacl, rrelacl, initrelacl, initrrelacl field assignments (replaced by PG14 pattern) - Remove duplicate getopt_long call (PG13 option string) - Remove orphaned PG14 args from binary_upgrade_set_type_oids_by_rel decl - Fix binary_upgrade_set_type_oids_by_type_oid body: remove tyinfo refs, use pg_type_oid directly, simplify array type OID handling - Fix binary_upgrade_set_type_oids_by_rel: use tblinfo->dobj.catId.oid Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The #if 0 block contained an unbalanced brace that the compiler skipped, causing a depth-1 nesting that made all subsequent functions appear nested. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ts section Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ard decls - pg_dump.c: add missing for-loop close + PQclear after #if 0 block - pg_dump.c: remove PG13 collectSecLabels/dumpDumpableObject forward decls - pg_dump.c: remove PG13 collectSecLabels() call (PG14 is on-demand) - pg_dump.c: fix getopt_long duplicate - common.c: remove PG13 binary search in findObjectByCatalogId (PG14 uses hash) - common.c: add forward declarations for buildIndexArray/DOCatalogIdCompare - common.c: remove duplicate findPublicationByOid Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…g_dump The #if 0 block was causing brace confusion between preprocessor and compiler. Remove it completely and keep just the working code. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…cific code) The merged version had deeply corrupted brace structure from interleaved PG13/PG14 attrdef handling. Cloudberry's version has clean PG14 code without any cloudberry-specific additions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t_function merge Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ffer stubs Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…andler Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…clitems - Fix broken sed-escaped simple_prompt calls in scripts/common.c - Replace fe_utils/connect_utils.h include with GPDB-specific declarations in common.h (our connectDatabase has different PG13-era signature) - Fix dumputils.c: raclitems -> revokeitems (variable name mismatch) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Remove PG13 if/else connect block from CloneArchive (PG14 connects above) - Fix createdb.c to use GPDB connectMaintenanceDatabase individual-arg API Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PG14 scripts use ConnParams struct but GPDB connectMaintenanceDatabase uses individual args. Add ConnParams typedef to common.h and a wrapper function that unpacks ConnParams into individual args. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Some scripts pass ConnParams by pointer (function params), others by value (local vars). Remove & when already a pointer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…pers Take cloudberry scripts/common.c and common.h which fully adopt PG14's fe_utils/connect_utils.h ConnParams-based connection API. Revert all connectDatabase_cparams/connectMaintenanceDatabase_cparams wrapper calls back to standard PG14 connectDatabase/connectMaintenanceDatabase. Restore ConnParams usage in createdb.c. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…arrayelem, fix exec_prepare_plan - Remove PG13 duplicate query/plan members from PLpgSQL_expr - Restore PLpgSQL_arrayelem struct (GPDB-specific, lost in merge) - Remove extra keepplan arg from exec_prepare_plan calls (PG14 removed it) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…_single_test args Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…oesn't have it) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…id_le) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…eeds uuid_le) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…(PG14) PG14 split errstart into errstart (warm) and errstart_cold (cold path for ERROR and above). Update EXPECT_EREPORT mock to expect errstart_cold when LOG_LEVEL >= ERROR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PG14 uses errstart_cold for elevel >= ERROR. Update EXPECT_EREPORT macros in all test files to set expectations on errstart_cold instead of errstart when testing ERROR/FATAL paths. Files fixed: dfmgr_test.c, session_state_test.c, postinit_test.c, runaway_cleaner_test.c, gp_replication_test.c Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…uild)
extract_nodes_expression() asserted its node argument non-NULL, but its
walker -- like the sibling extract_nodes() -- already treats NULL as "no
nodes to extract" and returns an empty list. ORCA's Agg translation
(CTranslatorDXLToPlStmt) extracts T_Aggref from both a plan node's
targetlist and its qual to densely number aggno/aggtransno (PG14). The
qual is frequently NIL (e.g. SELECT count(*) with no HAVING), so the
assert crashed the QD with FailedAssertion("node", walkers.c:731) on bare
aggregates.
This stayed invisible because the assert campaign builds --disable-orca
and the ORCA jobs build without asserts; the assert-and-ORCA intersection
(the "jit tests with orca" job) was never exercised. Drop the over-strict
assert to match extract_nodes().
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…PG14) PG14 removed postfix operators, so the `busy` view's WHERE clause `0 = (t1.c1 % 2 + 10000)!` is now a syntax error. CREATE VIEW busy failed and every later step cascaded: GRANT/SELECT/REVOKE/DROP on the missing view all errored, and the cpu_usage>50 check returned f because no busy workload ever ran. Replace the postfix factorial operator with the retained factorial() function in both the input and expected-output .source files. Same treatment as 36f871f gave gpcheckcat. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…wers These ORCA (optimizer=on) answer files embed the elog source location in the error text. createplan.c and setrefs.c shifted (now :8633 and :3380), so the committed expected files were stale (:8615, :3375 x12). The "jit tests with orca" job only reached these tests after fa65034 stopped the count(*) assert crash from aborting the schedule early. Error behavior is unchanged -- only the (file:line) annotations are updated to match source. The explain_optimizer.out column-width drift is left for the regen pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t heap REFRESH MATERIALIZED VIEW CONCURRENTLY could fail with ERROR: column _$newdata.type does not exist HINT: Perhaps you meant to reference the column "_$newdata._$type". while building the diff table in refresh_by_match_merge(). GPDB creates the transient "newdata" heap with make_new_heap_with_colname(..., "_$"), which prefixes every column name with "_$" to avoid collisions with the matview's own column names / query aliases (commit c654c50). The equality quals of the FULL JOIN must therefore reference the transient heap's columns by their prefixed names. The original fix did exactly that by emitting the left (newdata) operand from the transient heap's descriptor (newattr) and only the right (mv) operand from the matview's descriptor (attr). The PG14 merge re-grafted this block onto upstream's reshaped code and took upstream's operand expression for both sides, i.e. attr->attname, leaving newattr declared but unused. That regressed the left operand to the unprefixed matview name ("_$newdata".type) which does not exist on the transient heap (its column is "_$type"). The bug is masked whenever the matview's relcache descriptor is still carrying the in-place "_$"-prefixed names that make_new_heap_with_colname() left behind (then attr->attname is also "_$type", so both operands happen to agree). It surfaces once an intervening relcache invalidation reloads the matview to its real names, which is why it only reproduced under some build/run timings (the ORCA CI job) and not others. Restore the left operand to newattr->attname so it always matches the transient heap's actual (prefixed) column name. Verified by forcing the matview relcache reload locally: the failing CREATE TEMP TABLE ... diff query now resolves and REFRESH ... CONCURRENTLY succeeds; matview_ao passes under optimizer=on. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Under the Postgres planner, a query with multiple DISTINCT-qualified
aggregates where at least one carries a FILTER, e.g.
SELECT sum(DISTINCT a) FILTER (WHERE a > 0),
sum(DISTINCT b) FILTER (WHERE a > 0) FROM t;
failed at planning with
ERROR: variable not found in subplan target list (setrefs.c:3380)
(or "could not find pathkey item to sort" with a GROUP BY).
A DISTINCT aggregate's FILTER is enforced down in the TupleSplit node via
DQAExpr.agg_filter; fetch_multi_dqas_info() moves it there and clears
aggfilter so the Agg stages above never re-evaluate it. It cleared the
filter only on the cost-list Aggref copies (agg_partial_costs /
agg_final_costs ->distinctAggrefs). That was enough when those were the
same node objects the plan targetlists referenced, but in PG14
make_partial_grouping_target() builds separate flat-copies of the Aggrefs,
so the partial- and final-stage targetlists still carried the filter.
setrefs then tried to resolve the raw filter columns against a subplan
(the TupleSplit output) that no longer exposes them, hence the error.
This is the same flat-copy hazard already handled for agg_expr_id in
3cab355; that fix only covered the partial target and only propagated
the id. Generalize it: clear aggfilter on the DISTINCT Aggrefs of BOTH the
partial and final plan targets (and keep stamping agg_expr_id on the
partial stage), matched by aggno. copy_pathtarget() only shallow-copies
the expr list, so work on private copyObject()'d Aggrefs to avoid mutating
the nodes other candidate paths for this rel still share.
Verified: the failing forms now return correct results under optimizer=off
(and optimizer=on); gp_dqa no longer errors. The executor side of the same
feature is fixed in the following commit.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ExecTupleSplit() emits one tuple per DISTINCT-qualified aggregate for each input row, walking currentExprId across the DQAs. When a DQA carries a FILTER that rejects the current row it sets filter_out and advances currentExprId so the do/while retries the next DQA. But filter_out was assigned only inside the `if (filter)` branch: a DQA with no FILTER (agg_filter_array entry NULL) fell through, leaving filter_out at whatever the previous (filtered-out) DQA set it to, and without advancing currentExprId. With a mix of filtered and unfiltered DQAs, e.g. SELECT sum(DISTINCT a) FILTER (WHERE a > 0), sum(DISTINCT b) FROM t; any row failing the first DQA's filter then spun forever on the unfiltered DQA -- an uninterruptible hang once the split feeds a Motion (statement timeout cannot cancel a process blocked in motion IPC). A DQA without a FILTER never filters its tuple out, so clear filter_out in the else branch. Latent since the feature landed (2020); only reachable now that multi-DQA FILTER planning works again (preceding commit). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
create_motion_plan() built the Motion node directly over the plan produced for its subpath. When that subpath itself plans to a Motion, make_motion() trips Assert(!IsA(lefttree, Motion)) -- a Motion may not sit directly on a Motion. This arises when a path that already ends in a Motion is redistributed again, e.g. CREATE TABLE t AS SELECT sum(amt) AS s FROM src DISTRIBUTED BY (s); The single-row aggregate result is produced behind a gather/redistribute Motion, and the new table's distribution policy then asks to redistribute it by s, stacking a second Motion on the first. Interpose a pass-through Result between the two Motions so they occupy adjacent slices; setrefs rewires the Result's targetlist to reference the lower Motion's output via OUTER_VAR. Verified: such CTAS / CREATE MATERIALIZED VIEW statements now build and return correct, correctly distributed results. (This only completes the fix together with the distribution-key Aggref matching in the following commit; on its own the aggregate case still stops earlier at the createplan.c distkey lookup.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CREATE TABLE ... AS SELECT <aggregate> ... DISTRIBUTED BY (<aggregate>) (and
the matview equivalent) failed under the Postgres planner with
ERROR: could not find hash distribution key expressions in target list
(createplan.c:8633)
create_motion_plan() -> cdbpathlocus_get_distkey_exprs() ->
cdbpullup_findEclassInTargetList() matches each distribution-key equivalence
member against the subplan targetlist with equal(). PG14 (dfd85ea) added
aggno/aggtransno to Aggref and _equalAggref() compares them; preprocess_
aggrefs() stamps them on the query's targetlist aggregates, but the
distribution-key expression is an un-numbered copy (aggno = -1). equal()
therefore rejects two otherwise-identical aggregates purely on these physical
execution-slot numbers, so the hash key is "not found". Non-aggregate
computed keys (e.g. DISTRIBUTED BY (amt*2)) were unaffected, making the
failure aggregate-specific.
aggno/aggtransno carry no semantic meaning for this match, so on the fallback
path normalize them to -1 on both sides before comparing.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
REFRESH MATERIALIZED VIEW CONCURRENTLY failed with ERROR: operator does not exist: text pg_catalog.*= text when the materialized view has a column whose name collides with one of the refresh_by_match_merge() table aliases (newdata, newdata2, mv, diff) -- e.g. the matview built from a table with a "newdata" column, which the regression test matview.sql (mvtest_mv_foo) exercises on purpose. Commit c654c50 hardened these queries two ways: it created the transient "newdata" table with a "_$" column prefix, AND wrote the whole-row references as "alias.*" rather than a bare "alias" so an alias can never be mistaken for a same-named column. The PG14 merge re-grafted the queries onto upstream's shape, which renamed the aliases to "_$newdata"/"_$mv"/... and dropped the ".*". That reintroduced the conflict: the transient table's prefixed column "_$newdata" now matches the alias "_$newdata", so a bare "_$newdata" binds to the (text) column instead of the whole row, and "_$newdata2 *= _$newdata" looks for a nonexistent text *= text operator. Restore the "alias.*" form on the whole-row references in both the duplicate-check query and the diff FULL JOIN query, so they unambiguously expand the range-table row (record), matching the record *= record operator. Verified: REFRESH ... CONCURRENTLY succeeds on a matview with colliding column names and on the ordinary case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CREATE TABLE / MATERIALIZED VIEW AS SELECT min(x)/max(x) ... DISTRIBUTED BY
(<that aggregate>) crashed under the Postgres planner with
FailedAssertion("subroot->eq_classes == NIL", planagg.c)
build_minmaxagg_path() implements the MIN/MAX indexscan rewrite by
shallow-copying the PlannerInfo and running query_planner() on the clone.
That only works before query_planner() has built any equivalence classes,
join info or placeholders -- the clone would otherwise share, and the
subquery planning corrupt, root's lists; preprocess_minmax_aggregates() is
documented to run at exactly that early point and the clone asserts those
lists are empty.
In GPDB, a CREATE ... AS ... DISTRIBUTED BY (<col>) records the result
distribution as an equivalence class before grouping_planner reaches the
MIN/MAX preprocessing, so root->eq_classes is already non-empty and the
assert fires (a hard crash in assert builds; list corruption otherwise).
The rewrite is only an optimization, so add it to the existing reject list:
bail out when eq_classes/join_info_list/placeholder_list are already
populated, falling back to regular aggregation. Verified: such CTAS/matview
statements now build correct results, and ordinary MIN/MAX queries (e.g.
min(unique1) FROM tenk1) still get the Result/InitPlan/Limit/Index Only Scan
optimization under optimizer=off.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14 (e9c91a0c5)'s psql shows a "Compression" column in \d+ output for tables and materialized views. Regenerate the materialized-view describe blocks in matview.out to include it. Cosmetic, optimizer=off path only (matview_optimizer.out already matches under ORCA). With the matview correctness fixes earlier in this branch (distribution-key Aggref matching, the Motion-on-Motion Result interpose, the REFRESH ... CONCURRENTLY .* whole-row references, and the MIN/MAX distribute-by guard), the optimizer=off matview regression test now passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14 (2f70fdb) removed the long-deprecated geometric "contained in"/ "contains" operators @ and ~; create_index.sql still used them, so under the PG14 build those queries failed with "operator does not exist: box @ box" / "polygon ~ polygon". Replace them with the modern spellings upstream adopted (@ -> <@, ~ -> @>); the GiST opclasses support these and the results are unchanged (verified: home_base <@ box and f1 @> polygon return the same rows). Regenerate the optimizer=off create_index.out for that rename plus the other already-present PG14/GPDB cosmetic drift it was failing on: point_tbl's new (Infinity,1e+300) row, the \d+ Compression column, merge/sort-key plan shapes, and the REINDEX/CREATE INDEX CONCURRENTLY messages (the branch downgrades REINDEX CONCURRENTLY to a non-concurrent reindex with a NOTICE). Verified green via a fresh full-setup run (test_setup..create_index). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Regenerate the base (optimizer=off / Postgres planner) expected output for 38
tests whose only differences are cosmetic PG14/GPDB drift that the merge never
reconciled (these upstream/core tests aren't in greenplum_schedule, so the
campaign's optimizer=on runs never exercised them):
- psql \d+ now prints a Compression column (e9c91a0c5)
- AT TIME ZONE deparses as "(x AT TIME ZONE z)" not "timezone(z,x)"
- PG14 grammar/error-text: CREATE STATISTICS on expressions, EXTRACT as its
own function, reworded "column definition list" messages, etc.
- MPP plan shapes (Gather/Redistribute Motion, Partial/Finalize Aggregate,
merge/sort keys) and slice renumbering
- new upstream test queries (range_intersect_agg, added analyze steps, ...)
Verified there are no success->error transitions in any regenerated file (the
gate that previously surfaced the real gp_dqa/matview bugs), and spot-checked
a deterministic sample (stats_ext, domain, gporca, decode_expr, as_alias,
create_function_3) green under optimizer=off. DISTRIBUTED-BY NOTICE/HINT and
"Distributed by:" lines (ignored by gpdiff via init_file) were stripped to
keep the diffs to real content.
Excluded, needing separate handling: explain / truncate_gp (run-to-run flaky:
masked memory width, AO segfile stats); join / portals / subselect (genuine
optimizer=off behavior changes -- could-not-devise-plan, backward-scan/cursor
-- that need per-test judgement, not blind regen).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Follow-up to the optimizer=off baseline regen. These three were held back
from that pass because a raw git-diff scan flagged apparent success->error
transitions, but investigation shows they are not bugs:
- The flagged errors all sit inside --start_ignore/--end_ignore blocks
(gpdiff drops them) that deliberately document GPDB limitations the tests
already know about: a LATERAL join that yields "could not devise a query
plan" (join.sql carries an explicit "-- FAIL with ERROR" comment),
"backward scan is not supported", and SCROLL/"cursor can only scan
forward". Several others are error->error stale line-number drift (e.g.
could-not-devise pathnode.c:485 -> :275), already present in the expected
output.
- The real gpdiff differences are cosmetic optimizer=off plan shapes
(Index Scan vs Seq Scan, Merge Join vs Hash Join, Motion shapes).
The large git diffs are row-reordering of unordered result sets that gpdiff
sorts away (no GP_IGNORE-prefixed changes were touched). Verified: join and
portals are green on re-run with full setup; subselect's only re-run delta is
a "subselect_tbl already exists" artifact of re-running into an existing db.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add reusable Claude Code skills under .claude/skills/ distilled from the
PG14 -> Greengage/GPDB merge, fix and regression-reconciliation work:
greengage-build build/install in the container; the stale-binary trap
greengage-regress-tests pg_regress/isolation2; optimizer on/off; setup deps
greengage-answer-file-regen cosmetic regen + the success->error safety gate
greengage-cluster-ops gpdemo health, disk/progress monitoring, gprecoverseg
greengage-debug log/crash analysis, repro, instrumentation, gdb
greengage-internals MPP planner/executor, merge re-graft methodology,
the PG14 Aggref aggno/aggtransno bug class
Repoint CLAUDE.md's merge-workflow section at greengage-internals (the old
GG_PG_MERGE_SKILL.md reference was dangling).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Refer to the fork as GGDB (Greengage) throughout the new skills and the CLAUDE.md fork-referring uses. Kept the one "Greenplum Database (GPDB)" on line 7 since there GPDB abbreviates the upstream (Greenplum), not the fork. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
My prior cosmetic optimizer=off regen (6b6e7bf) rewrote several SHARED expected files (no _optimizer variant) to the Postgres-planner output, which broke the ORCA job because optimizer=on falls back to the same base .out. Restore the shared base files to their committed (both-optimizer) output: as_alias, create_function_3, decode_expr, select_into, subselect, timestamptz (this also re-greens as_alias/decode_expr under optimizer=off — their CI output is optimizer-agnostic and matches the restored base). Regenerate the three _optimizer expected files from the authoritative ORCA CI results, capturing genuine improvements/changes my code fixes produced: - matview_optimizer.out: the distribute-by-aggregate matview now SUCCEEDS under ORCA (createplan Result-interpose + cdbpullup aggno + planagg guard fixed the shared path), so the old createplan.c "could not find hash distribution key" error is gone, not merely line-shifted. - gp_dqa_optimizer.out: the nodeTupleSplit filter-reset fix corrected multi-DQA + FILTER results under ORCA too (error rows -> correct data). - create_index_optimizer.out: PG14 geometric-operator rename (@->@<, ~->@>). Safety gate (gpdiff-aware success->error scan) returns 0 across all nine files. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ate local regen) The earlier cosmetic optimizer=off regen (6b6e7bf) captured local-environment- specific output instead of the canonical CI output, so the JIT/Postgres-optimizer job failed on: ao_locks "1 segment" baked in instead of the normalized "n segments" subselect a fresh db named 'jps2' baked into current_database()-folded literals instead of 'regression' qp_orca_fallback a stale plan shape (the Result-interpose now appears) qp_misc_jiras local row-ordering + "1 segment" contamination qp_targeted_dispatch same class of local contamination Regenerate each base .out from the authoritative optimizer=off CI results. All five have an _optimizer.out variant, so the ORCA path is unaffected. Verified: - qp_misc_jiras opt=off data-row SET is identical to qp_misc_jiras_optimizer.out (1993 rows, 0 set difference) — only plan shape / ordering differ, data is correct. - the two safety-gate ERROR hits (gp_configuration / backward-scan) sit inside -- start_ignore/-- end_ignore blocks (gpdiff-ignored), and were actually DROPPED by the earlier bad regen — this restores them. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…changes
table_functions is a .source test; its base output/table_functions.source (the
optimizer=off expected) was left stale by the merge while the ORCA variant
(table_functions_optimizer.source) was already updated, so only the
JIT/Postgres-optimizer job failed it. Re-graft the upstream PG14 cosmetic changes
into the base:
- "a column definition list is only allowed for functions returning \"record\""
-> "... is redundant for a function with OUT parameters" (x2)
- EXTRACT now reports column label / error as extract() rather than date_part()
("function pg_catalog.date_part(unknown, integer)" -> "...extract(...)").
The Unique/Sort/HashAggregate plan-shape drift is intentionally absorbed by the
file's own SORT_OR_HASH matchsub (lines 1-4), so no plan edit is needed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
explain failed ONLY in the JIT CI jobs (JIT+ORCA, JIT+Postgres) while passing the non-JIT jobs. Cause: the JIT jobs enable jit and lower the jit cost thresholds, so GPDB's EXPLAIN (VERBOSE) emits a wide ` Settings: jit = 'on', jit_above_cost = 'N', optimizer_jit_above_cost = 'N'` line. init_file ignores the line's CONTENT (m/^ Settings:.*/), but its WIDTH still sets the explain_filter output column, which shifts the column header/separator padding that atmsort does NOT normalize -> diff. Baking the jit width would break the passing non-jit jobs, so a regen alone can't fix it. Fix: pin the three jit GUCs to their boot defaults at the top of explain.sql (jit=off, jit_above_cost=100000, optimizer_jit_above_cost=7500). EXPLAIN (SETTINGS) only reports GUCs modified from their built-in default, so the jit entries drop out and the Settings line becomes identical (` Settings: optimizer = 'off'` under optimizer=off, absent under ORCA) across the jit and non-jit jobs. This matches the test's own intent -- the JSON cases already strip jit "as they vary in test environment". Regenerated both expected files from a faithful local run (int8_tbl + tenk1, the only tables explain uses; plans are stats-insensitive). The base explain.out was also stale and now picks up upstream drift it had missed (new explain_filter definition, the "Async Capable" EXPLAIN field, the extra (buffers, format ...) queries). Verified: both optimizer=off and optimizer=on re-run clean (0 regression.diffs); no local-env contamination (db name / paths / hostname); success->error gate = 0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-regen
Update greengage-answer-file-regen with the hard-won rules from the JIT/ORCA CI
triage that the original draft got wrong or missed:
- Regenerate from the failing job's CI RESULT TARBALL, not a local gpdemo run
(local bakes 'jps2' db names, "1 segment", local row-order that CI rejects);
local regen is only safe for fully-normalized tests like explain.
- A shared base <t>.out with no <t>_optimizer.out is used by ORCA too -> a
Postgres-planner regen of it breaks the ORCA jobs.
- Don't bake JIT-only EXPLAIN output into a file a non-JIT job compares.
- explain is NOT unfixable-flaky: pin the three jit GUCs to their boot defaults
in explain.sql so EXPLAIN(SETTINGS) drops them, then regen (done 23db61b).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mestamptz These three SHARED base files (no _optimizer.out) pass ORCA but fail optimizer=off after b4275d4 reverted them to the ORCA-correct version (the shared-file dilemma: a single base can't satisfy both planners). Split them: keep the ORCA-passing output as <t>_optimizer.out, and regenerate the base <t>.out from the optimizer=off CI result so each planner path gets its own answer. a13 (ORCA) confirms the _optimizer side; success->error gate = 0 on the bases. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two remaining optimizer=off diffs after e3c9618: - PG14 plans the DISTINCT in two multiset_5 trees as Sort-over-HashAggregate instead of Unique-over-Sort. atmsort canonicalizes the text EXPLAIN to node types only (costs/keys stripped) and applies the SORT_OR_HASH matchsub to the RAW text BEFORE canonicalizing, so the matchsub never fires -> update the node types in the base source (Unique->Sort, child Sort->HashAggregate). - e3c9618 dropped the trailing space on the " extract " column header (atmsort does NOT normalize it); restore it. Verified locally: the plan-shape and extract diffs are gone; the only residual diff is an environment-specific pg_am row (heap2) absent from the local cluster but present on CI (a14's diff never included it). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rollover) direct_dispatch's `explain (costs off) insert ... values(now())` constant-folds now()::date into the plan's `Hash Key: 'MM-DD-YYYY'::date` for direct dispatch, so the optimizer=off base answer was stamped with whatever day it was regenerated (06-16-2026) and fails every CI run on a later date. Add a date matchsub (the same mechanism table_functions uses for SORT_OR_HASH) to the .sql and both expected files so the date is normalized on both sides. Only the base plan shows the date; the ORCA plan does not, but the echoed matchsub block must be present in the _optimizer expected too. Verified locally: optimizer=off and optimizer=on both pass (committed 06-16 date vs today's 06-17 result, both normalized to MM-DD-YYYY). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…P/ORCA fixes
New GGDB regression test guarding the MPP/ORCA-path bugs fixed during the
PostgreSQL 14 -> Greengage merge. Each section is a minimal reproducer that
crashes, hits a dispatch deserialize error, or returns wrong data on the
pre-fix code:
1. distributed + partitioned UPDATE/DELETE (ORCA updateColnosLists SIGSEGV,
partitioned U/D planner fallback)
2. several different aggregates in one query (ORCA Aggref aggno/aggtransno --
bug returned the first aggregate's value for all)
3. correlated EXISTS + correlated scalar subquery in the SELECT list
(AlternativeSubPlan cdbllize crash + lost correlation)
4. CTAS / matview DISTRIBUTED BY (<aggregate>) (Motion-on-Motion Result
interpose + MIN/MAX planagg eq_classes guard) and a concurrent matview
refresh over a text column (refresh_by_match_merge alias.* / _$ / text=text)
5. CREATE VIEW + CTAS over JOIN USING (binary-dispatch JoinExpr join_using_alias)
6. jsonb/array subscript assignment (ORCA SubscriptingRef.refrestype)
Data-only (no EXPLAIN) so the single expected file is stable across the
optimizer x JIT CI matrix; ORCA output matches the base, so no _optimizer.out is
needed. Verified locally: passes under optimizer=off and optimizer=on (0 diffs).
Registered standalone in greenplum_schedule to avoid parallel-group contention
from its partition DDL + concurrent matview refresh.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14's GROUP BY DISTINCT (Query.groupDistinct) deduplicates the grouping sets generated by overlapping ROLLUP/CUBE/GROUPING SETS. ORCA's query translator never looked at the flag, so it silently emitted the full, duplicated set of grouping sets -- e.g. `GROUP BY DISTINCT ROLLUP(a,b), ROLLUP(a,b)` returned 37 grouping-set rows under optimizer=on versus the correct 9 under the Postgres planner. ORCA does not implement grouping-set dedup, so raise the standard Query2DXLUnsupportedFeature in CheckUnsupportedNodeTypes when groupDistinct is set; the query then falls back to the Postgres planner, which handles it correctly. GROUP BY DISTINCT had no regression coverage, so no existing test relied on the buggy behavior. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New GGDB test exercising PG14 features that arrived in the merge with little or no
regression coverage, over distributed data and under both optimizers:
- GROUP BY DISTINCT dedup of overlapping grouping sets (guards the ORCA fallback
just added; ORCA previously returned the un-deduped 37 rows instead of 9)
- recursive CTE SEARCH DEPTH FIRST and CYCLE detection over a distributed graph
- multirange types: containment (@>), overlap (&&), bounds (lower/upper) and
intersection (*), distributed
Data-only and ORDER BY-stable so a single expected file covers the optimizer x JIT
matrix. Verified under optimizer=off and optimizer=on (0 diffs). Notes a gap found
while writing it: unnest(anymultirange) (C multirange_unnest + its pg_proc entry)
was not carried into this branch, so the multirange section uses the constituent-
range operators instead.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ckfill) Document the P2 finding: adding an _optimizer.out to a test that already passes under ORCA against the shared base asserts nothing new (verified for delete/insert_conflict/with -- their ORCA output either matches the base byte-for- byte, is gpdiff-normalized noise, or has its plans in --start_ignore blocks). The real ORCA coverage gaps are untested features (found by exercising them under both optimizers), not missing _optimizer.out files. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…G14) range_agg(anyrange) returns the PG14 anymultirange polymorphic type. ORCA does not resolve it to the concrete multirange type, so the anymultirange pseudo-type reaches execution and the aggregate's finalfn errors with "type 4537 is not a multirange type" (multirangetypes.c:550) under optimizer=on, while the Postgres planner returns the correct multirange. ORCA doesn't implement this resolution, so detect any Aggref whose result type is a multirange in CheckUnsupportedNodeTypes and raise the standard unsupported-feature fallback to the planner. Adds a gpdb::IsMultirangeType wrapper (type_is_multirange). range_intersect_agg(anyrange)->anyrange is unaffected (no multirange result) and keeps using ORCA. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ORCA coverage
Add coverage for more PG14 features over MPP, run under both optimizers, found by
probing for ORCA divergences:
- subscripting (SubscriptingRef) in the SELECT list, WHERE, and a join key
(the refrestype class beyond the UPDATE path) -- ORCA handles these correctly
- extract() PG14 numeric return + date_bin() over distributed timestamptz
(timezone/datestyle pinned for portable rendering)
- range_agg() -> anymultirange, which now falls back to the planner under ORCA
(locks in a235ac4) and returns the correct multirange
- WITH ... NOT MATERIALIZED inlining over distributed CTEs
ORCA output matches the base for all of them (correct native handling or fallback),
so a single expected file suffices. Verified under optimizer=off and optimizer=on
(0 diffs).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merge PostgreSQL 14 into Greengage
Summary
Brings the upstream PostgreSQL 14 commit range
d259afa736..e1c1c30f635into theGreengage MPP fork and carries it to a green regression matrix. The raw merge
(
8597e91cda3) resolved ~700 conflict files; 356 follow-up commits then take thetree through every bring-up phase — compile → mock unit tests →
initdb/bootstrap →the JIT × {ORCA, Postgres-planner} regression matrix — fixing each distinct class of
breakage, and add new regression coverage plus reusable tooling. Scale layered on the
raw merge: 636 files, +58k/−39k.
1. Merge & conflict resolution
Resolved semantically — adopt the upstream API shape, then re-graft the GGDB/MPP logic:
PROC_HDRarrays; GGDB distributed-snapshot andreader/writer XID sharing re-grafted onto
ProcGlobal->xids[pgxactoff].relkind→objtypeenum in CTAS / RefreshMatView / IntoClause.copy.csplit (copyfrom/copyto) — kept GGDB's monolithiccopy.c, re-graftedonly the protocol change.
bare_label_keyword/BareColLabelsuperseding GGDB'sColLabelNoAs.collisions),
errstart/errstart_cold, long-livedWaitEventSet,GlobalVis*horizon API, bulk
pg_attributeinsert +attcompression.2. Bring-up by phase
make unittest-check) —errstart/errstart_coldmock split,new-GUC coverage lists,
mock.mklink order; the mock suites pass.mocks catch: BKI single-quote convention, genbki
PGUIDsubstitution, catalogheader order, missing index DECLAREs,
pg_proc.datlast-wins duplicates, row-identitywiring for UPDATE/DELETE.
3. MPP/ORCA bug fixes (the dominant class)
PG14 added fields/types that ORCA's translator silently omitted or mishandled, causing
crashes or wrong results only under
optimizer=on. Fixed:ModifyTable.updateColnosLists(segment SIGSEGV) + partitioned U/D fallbackAggref.aggno/aggtransno— every aggregate returned the first one's valueSubscriptingRef.refrestype— "cache lookup failed for type 0"GROUP BY DISTINCTignored (37 vs 9 grouping-set rows) → planner fallbackrange_agganymultirangeunresolved ("type 4537 is not a multirange type") → fallbackDISTRIBUTED BY (<aggregate>)(Motion-on-Motion, MIN/MAX planagg guard)join_using_aliasREFRESH … CONCURRENTLY(prefixed transient heap, whole-row.*refs)pq_getmessagemaxlen), nested InitPlan across a Motionpg_rewindsegmentgp_dbidpreservation4. Test reconciliation & the CI matrix
CI runs the same
expected/*.outacross four jobs — {JIT, non-JIT} × {ORCA, Postgresplanner}. Principles applied (and captured in the tooling): regenerate answer files only
from the failing job's CI result (local runs bake env-specific values like db name /
segment count); a shared base
.outis used by ORCA too, so regenerating it to thePostgres-planner output breaks ORCA; JIT-only EXPLAIN output (the
Settings:line) mustnot be baked into a file a non-JIT job compares. Representative fixes:
explainstabilized via
SET jit = off+ boot-default cost GUCs;table_functions/direct_dispatchmatchsubs;
_optimizer.outsplits for shared optimizer-dependent tests.5. New regression coverage
gp_pg14_merge_regress— lock-in test: one minimal reproducer per MPP/ORCA bugclass above, so a revert re-introduces the crash/wrong-result. Runs under both optimizers.
gp_pg14_features— MPP/ORCA coverage for under-tested PG14 features (GROUP BYDISTINCT, recursive-CTE SEARCH/CYCLE, multirange, subscripting in SELECT/WHERE/JOIN,
extract/date_bin,range_agg, NOT MATERIALIZED). This test caught the two newORCA bugs (GROUP BY DISTINCT and range_agg), both fixed in this PR.
6. Tooling / docs
Seven reusable Claude Code skills under
.claude/skills/distilled from the campaign(build, regress-tests, answer-file-regen, cluster-ops, debug, internals, pg-merge).
Verification
opt=off/ORCA long tail is closed.
optimizer=offandoptimizer=on.Known remaining (not regressions)
fts_session_reset,segwalrep/*,pg_rewind_*) fail only in CI because the small gpdemo runs out of memoryduring the full ~250-spec schedule — OOM events time-correlate with the failures and
all specs pass locally in isolation. CI cluster-capacity, not code.
unnest(anymultirange)was not carried into the branch (no Cmultirange_unnest/catalog entry) — the new test covers multirange via the constituent-range operators;
porting it is a candidate follow-up.