Skip to content

pg_dump: --create-empty-files-for-excluded-data (directory format)#47

Open
NikolayS wants to merge 2 commits into
masterfrom
pg-dump-empty-excluded-data
Open

pg_dump: --create-empty-files-for-excluded-data (directory format)#47
NikolayS wants to merge 2 commits into
masterfrom
pg-dump-empty-excluded-data

Conversation

@NikolayS

Copy link
Copy Markdown
Owner

Review/test of Kirk Wolak's patch (kirkw/postgres @ pg-dump-empty-excluded-data), cherry-picked onto our fork's master for a clean 2-commit diff. Authorship preserved.

What it does

Adds --create-empty-files-for-excluded-data to pg_dump. For directory-format dumps, tables whose data is excluded via --exclude-table-data[-and-children] still get a NNNN.dat file written containing only the COPY end marker (\.\n\n\n) instead of being omitted entirely.

Motivation: enables a parallel-dump workflow for huge tables — dump the schema with placeholder data files, then have an external tool fill those .dat files (e.g. split COPY streams across workers) without any merge step. pg_restore then loads them like any normal directory-format data file.

Commits

  • Add --create-empty-files-for-excluded-data for directory-format pg_dump.
  • Restore pg_restore coverage in pg_dump-only TAP test.

Guard rails (all verified)

The option is rejected with a clear error unless the dump is:

  • combined with --exclude-table-data / --exclude-table-data-and-children
  • directory format (not custom/plain/tar)
  • COPY data (not --inserts / --column-inserts / --rows-per-insert)

Testing done (this review)

Built on macOS (meson, PG 19beta1). Manual end-to-end verification of every assertion in the new TAP test t/012_pg_dump_empty_excluded_data.pl:

  • ✅ Dump succeeds; TOC lists TABLE DATA public skip_data; two .dat files produced.
  • ✅ Excluded table's .dat is exactly 5c2e 0a0a 0a (\.\n\n\n) — byte-identical to how dumpTableData_copy() ends a genuinely empty table.
  • ✅ Included table's .dat non-empty; restore round-trips (keep=2 rows, skip=0 rows).
  • Parallel-dump workflow: externally overwriting the placeholder .dat with real COPY data and re-running pg_restore loads all rows correctly.
  • ✅ All three guard-rail errors fire as expected.

Note: the formal meson test TAP harness could not run on this Mac because SIP strips DYLD_LIBRARY_PATH when initdb is launched via /usr/bin/python3 (a known PG-on-macOS quirk, unrelated to this patch). Assertions were instead reproduced manually against the freshly built binaries.

Review notes / nits (non-blocking)

  • Option uses getopt value 26; 23/24 are unused — 23 would have been the natural next number.
  • dumpTableData_empty() is also wired into the INSERT branch of dumpTableData(), but that path is unreachable since the option rejects all INSERT modes up front. Harmless/defensive.

This is the pg_dump-only half of Kirk's work (the pg_restore split-file-loading side is a separate future patch).

wolakk and others added 2 commits June 16, 2026 09:37
When used with --exclude-table-data in -Fd output, still emit TABLE DATA
TOC entries and placeholder .dat files containing a COPY end marker for
excluded tables.  Restrict the option to directory format and COPY output;
document it and add a TAP test covering validation and dump contents.

Co-authored-by: Cursor <cursoragent@cursor.com>
Stock pg_restore handles placeholder COPY end-marker files correctly;
the earlier restore failure was from mixed build artifacts after
branch switching without make clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants