Skip to content

SPOC-482: Add replay queue spill-to-disk for large transactions#402

Open
danolivo wants to merge 3 commits intomainfrom
spoc-482
Open

SPOC-482: Add replay queue spill-to-disk for large transactions#402
danolivo wants to merge 3 commits intomainfrom
spoc-482

Conversation

@danolivo
Copy link
Copy Markdown
Contributor

@danolivo danolivo commented Mar 30, 2026

When the apply worker's replay queue exceeds spock.exception_replay_queue_size (in MB), subsequent entries are written to a temporary BufFile on disk instead of accumulating in memory. This prevents OOM on the subscriber when replaying large transactions with exception handling enabled.

Design

  • In-memory queue entries live in ApplyReplayContext and hold libpq-allocated data buffers. apply_replay_queue_reset (called from handle_commit) walks the list to PQfreemem them before calling MemoryContextReset.
  • Spilled entries are relocated from ApplyReplayContext to TopMemoryContext so they survive the MemoryContextReset in handle_commit. The caller frees them explicitly after replication_handler returns.
  • Added from_pq field to ApplyReplayEntry to distinguish libpq-allocated buffers (PQfreemem) from palloc'd spill-read buffers (pfree).
  • When the replay queue is exhausted mid-transaction (exception occurred before COMMIT was received), the inner loop does continue instead of break, seamlessly transitioning to stream reading — matching the pre-spill behavior and ensuring handle_commit runs with a clean transaction state.

Changes

  • apply_replay_queue_append_entry: appends to in-memory list or spills to disk; returns a possibly-relocated entry pointer so the caller can detect spilling.
  • apply_replay_queue_next_entry: drains in-memory list first, then reads from spill file; sets apply_replay_mode = false when exhausted.
  • apply_replay_spill_write_entry / apply_replay_spill_read_entry: binary format [int32 len][data] via BufFile.
  • apply_replay_queue_start_replay: seeks spill file back to start for re-read after exception.
  • apply_replay_queue_reset: unified cleanup — walks list for PQfreemem, resets all state, closes spill file, resets context.
  • Regression tests for all three exception_behaviour modes (transdiscard, discard, sub_disable), both without and with a primary key conflict on the last record of a 50k-row COPY.

@danolivo danolivo self-assigned this Mar 30, 2026
@danolivo danolivo added the enhancement New feature or request label Mar 30, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

Reintroduces configurable in-memory replay-queue semantics with on-disk spill support, implements BufFile-based spill/read and explicit replay-mode state in the apply path, updates a GUC and an exported variable comment, and adds a new regression test plus related test-config and test SQL changes.

Changes

Cohort / File(s) Summary
Apply & replay implementation
src/spock_apply.c
Adds from_pq flag, explicit apply_replay_mode, BufFile spill/read routines, spill-aware queue append/reset/free logic, and per-entry lifetime management; changes apply loop to select streaming vs. queued entries.
GUC & exported var
src/spock.c, include/spock.h
Removes deprecation comment from spock_replay_queue_size; changes spock.exception_replay_queue_size help/semantics to describe an in-memory limit with optional spill-to-disk (0 = disable spilling/unlimited).
Test runner & config
Makefile, tests/regress/regress-postgresql.conf
Adds spill_transaction to REGRESS; disables autovacuum in test config and adjusts example log level comment.
Regression SQL tests
tests/regress/sql/primary_key.sql, tests/regress/sql/spill_transaction.sql
Adds polling block to wait for subscription disabled in primary_key.sql; introduces spill_transaction.sql to force spill-to-disk, exercise large COPY transactions, and validate behavior across exception modes (transdiscard, discard, sub_disable).

Poem

🐇 I nudged a queue that grew too tall,
To a BufFile burrow I sent each brawl,
Spilled rows snoozed on disk, then hopped back near,
Replay chose memory or file without fear,
A tiny hop — replication cheer!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically summarizes the main change: adding replay queue spill-to-disk functionality for large transactions, which is the primary objective and most significant technical contribution across all modified files.
Description check ✅ Passed The PR description comprehensively covers the changeset, detailing the design decisions, implementation approach, memory management strategy, and regression test coverage for the spill-to-disk feature across all exception handling modes.
Docstring Coverage ✅ Passed Docstring coverage is 92.31% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spoc-482

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/spock_apply.c (1)

2955-3044: ⚠️ Potential issue | 🔴 Critical

Run spill-entry cleanup in both replay and error paths.

apply_replay_queue_next_entry() can return a TopMemoryContext entry from apply_replay_spill_read_entry(), but the free here only keys off entry_spilled, which is set only for first-pass stream entries. Replay entries from disk therefore survive both this block and apply_replay_queue_reset(), and a failing first-pass spilled entry also leaks because replication_handler() throws before Line 3043. That keeps spill-backed buffers alive until worker exit and can reintroduce the memory pressure this feature is supposed to remove.

♻️ Suggested fix
-					replication_handler(msg);
-
-					/*
-					 * Free spilled entries explicitly: their structs live in
-					 * TopMemoryContext (not ApplyReplayContext), so they are
-					 * not cleaned up by apply_replay_queue_reset.
-					 *
-					 * In-memory entries (both first-pass and replay) live in
-					 * ApplyReplayContext and are freed by
-					 * MemoryContextReset inside apply_replay_queue_reset,
-					 * called from handle_commit.
-					 */
-					if (entry_spilled)
-						apply_replay_entry_free(entry);
+					PG_TRY();
+					{
+						replication_handler(msg);
+					}
+					PG_CATCH();
+					{
+						if (entry_spilled || (!queue_append && !entry->from_pq))
+							apply_replay_entry_free(entry);
+						PG_RE_THROW();
+					}
+					PG_END_TRY();
+
+					/*
+					 * Free entries whose storage lives outside
+					 * ApplyReplayContext.
+					 */
+					if (entry_spilled || (!queue_append && !entry->from_pq))
+						apply_replay_entry_free(entry);

Also applies to: 3994-4042

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 2955 - 3044, The replayed entry from
apply_replay_queue_next_entry() can be a spilled TopMemoryContext entry (from
apply_replay_spill_read_entry()) but entry_spilled is only set for first-pass
stream appends, so spilled replay entries and spilled entries when
replication_handler() throws are never freed; ensure spilled entries are always
freed regardless of origin and on errors by treating spilled status based on
where the entry came from and by wrapping replication_handler(msg) in an
exception-safe cleanup: set entry_spilled whenever
apply_replay_queue_next_entry() returned a spilled entry (or detect that from
the entry), call apply_replay_entry_free(entry) after replication_handler(msg)
as currently done, and also free the spilled entry in a PG_TRY/PG_CATCH (or
equivalent error-path) before rethrowing so that both replay and error paths
release TopMemoryContext-backed entries; update any logic in functions
apply_replay_queue_append_entry, apply_replay_queue_next_entry,
replication_handler, and the local use of entry_spilled to reflect this
invariant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_apply.c`:
- Line 137: Change apply_replay_bytes from int to an unsigned wider type
(recommended Size or uint64) and update all related uses: in the increment where
apply_replay_bytes += msg->len (referencing msg->len) cast/ensure both operands
use the wider type, update the comparison in the spill/check logic (the branch
that resets apply_replay_bytes when it exceeds the spill threshold) to compare
using the wider type (with any necessary casts), and change the
diagnostic/format string that prints apply_replay_bytes from "%d" to "%zu" (or
the appropriate specifier for the chosen type). Ensure all assignments,
increments, and prints use the new type consistently to avoid overflow and type
warnings.

In `@tests/regress/sql/spill_transaction.sql`:
- Around line 159-165: The test races because disabling is asynchronous; modify
the sequence around the SELECT from spock.subscription and the call to
spock.sub_enable('test_subscription', true) so the test first polls/waits until
the subscription row reflects the disabled state (i.e., sub_enabled = false /
sub_disabled = true) before deleting the conflicting row and calling
spock.sub_enable; use a simple retry loop with a short sleep that queries
spock.subscription for sub_name = 'test_subscription' and proceeds only once the
row shows disabled to avoid the apply-worker flip racing the re-enable.

---

Outside diff comments:
In `@src/spock_apply.c`:
- Around line 2955-3044: The replayed entry from apply_replay_queue_next_entry()
can be a spilled TopMemoryContext entry (from apply_replay_spill_read_entry())
but entry_spilled is only set for first-pass stream appends, so spilled replay
entries and spilled entries when replication_handler() throws are never freed;
ensure spilled entries are always freed regardless of origin and on errors by
treating spilled status based on where the entry came from and by wrapping
replication_handler(msg) in an exception-safe cleanup: set entry_spilled
whenever apply_replay_queue_next_entry() returned a spilled entry (or detect
that from the entry), call apply_replay_entry_free(entry) after
replication_handler(msg) as currently done, and also free the spilled entry in a
PG_TRY/PG_CATCH (or equivalent error-path) before rethrowing so that both replay
and error paths release TopMemoryContext-backed entries; update any logic in
functions apply_replay_queue_append_entry, apply_replay_queue_next_entry,
replication_handler, and the local use of entry_spilled to reflect this
invariant.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c14dad2-8228-4bce-b745-5db36660b35f

📥 Commits

Reviewing files that changed from the base of the PR and between a116148 and 640d20c.

⛔ Files ignored due to path filters (2)
  • tests/regress/expected/primary_key.out is excluded by !**/*.out
  • tests/regress/expected/spill_transaction.out is excluded by !**/*.out
📒 Files selected for processing (7)
  • Makefile
  • include/spock.h
  • src/spock.c
  • src/spock_apply.c
  • tests/regress/regress-postgresql.conf
  • tests/regress/sql/primary_key.sql
  • tests/regress/sql/spill_transaction.sql

@danolivo danolivo force-pushed the spoc-482 branch 3 times, most recently from 2463ad4 to 17e331e Compare March 31, 2026 09:21
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_apply.c`:
- Around line 2957-2960: The loop that calls apply_replay_queue_next_entry() and
processes entries fails to free TopMemoryContext-allocated spill entries
returned by apply_replay_spill_read_entry() on both success and error, leaking
memory if replication_handler() throws or when stream-pointer changes; update
the processing loop (around apply_replay_queue_next_entry and where
apply_replay_spill_read_entry is used) to always free those replay entries from
TopMemoryContext after they are consumed — i.e., after replication_handler()
succeeds or in the error/exception path before continue/return — and centralize
the free logic so both the normal success path and any early-exit/error paths
free the entry (refer to functions apply_replay_queue_next_entry(),
apply_replay_spill_read_entry(), replication_handler() and the
ApplyReplayContext handling).

In `@src/spock.c`:
- Around line 1155-1163: The GUC DefineCustomIntVariable call for
spock_replay_queue_size currently sets the minimum to -1 which creates an
undocumented "unlimited" alias; either change the minimum back to 0 to prevent
the undocumented value or update the help string to explicitly document -1 as a
compatibility alias; locate the DefineCustomIntVariable invocation (symbol:
DefineCustomIntVariable and variable spock_replay_queue_size) and implement one
of the two fixes: (A) set the lower bound argument from -1 to 0 and adjust any
tests/comments, or (B) keep -1 but extend the help text to mention that -1 is a
compatibility alias for unlimited memory and ensure any validation/logic that
checks spock_replay_queue_size treats -1 identically to 0.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 29266b9f-2ff7-4e43-83cd-2d63e42ab8c4

📥 Commits

Reviewing files that changed from the base of the PR and between b1b402c and 2463ad4.

⛔ Files ignored due to path filters (2)
  • tests/regress/expected/primary_key.out is excluded by !**/*.out
  • tests/regress/expected/spill_transaction.out is excluded by !**/*.out
📒 Files selected for processing (7)
  • Makefile
  • include/spock.h
  • src/spock.c
  • src/spock_apply.c
  • tests/regress/regress-postgresql.conf
  • tests/regress/sql/primary_key.sql
  • tests/regress/sql/spill_transaction.sql
✅ Files skipped from review due to trivial changes (2)
  • include/spock.h
  • tests/regress/sql/spill_transaction.sql
🚧 Files skipped from review as they are similar to previous changes (2)
  • Makefile
  • tests/regress/regress-postgresql.conf

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/spock.c (1)

1155-1163: ⚠️ Potential issue | 🟡 Minor

Document -1 explicitly or restore lower bound to 0.

Lines 1156-1159 document 0 as the disable/unlimited mode, but Line 1162 allows -1; with spill activation gated by spock_replay_queue_size > 0 (src/spock_apply.c:4152-4170), -1 is also effectively “disable spill.” This leaves an undocumented alias.

Suggested fix (pick one policy)
-							-1,
+							0,

or keep -1 and update the help text to explicitly state -1 is accepted as an alias for unlimited/no spill.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock.c` around lines 1155 - 1163, The help text for
DefineCustomIntVariable("spock.exception_replay_queue_size", ...) claims "Set to
0 to disable spilling" but the call allows -1 as a valid value (min = -1) which
is also treated as "disable"; either restore the lower bound to 0 by changing
the min argument from -1 to 0 so only 0 means unlimited, or keep -1 allowed but
update the help string to explicitly mention that -1 is an accepted alias for
unlimited/no spill; update the DefineCustomIntVariable invocation and the
user-facing help text accordingly and ensure any references/logic in
spock_apply.c (spock_replay_queue_size checks) remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/regress/sql/spill_transaction.sql`:
- Around line 178-183: The loop currently checks an unqualified scalar: (SELECT
status FROM spock.sub_show_status()), which can return ambiguous results if
multiple subscriptions exist; change the subquery to explicitly target the test
subscription by either calling a subscription-specific accessor or filtering the
set for 'test_subscription' (e.g. SELECT status FROM spock.sub_show_status()
WHERE name = 'test_subscription' or use
spock.sub_show_status('test_subscription') if that overload exists) so the WHILE
condition only polls the test_subscription row.

---

Duplicate comments:
In `@src/spock.c`:
- Around line 1155-1163: The help text for
DefineCustomIntVariable("spock.exception_replay_queue_size", ...) claims "Set to
0 to disable spilling" but the call allows -1 as a valid value (min = -1) which
is also treated as "disable"; either restore the lower bound to 0 by changing
the min argument from -1 to 0 so only 0 means unlimited, or keep -1 allowed but
update the help string to explicitly mention that -1 is an accepted alias for
unlimited/no spill; update the DefineCustomIntVariable invocation and the
user-facing help text accordingly and ensure any references/logic in
spock_apply.c (spock_replay_queue_size checks) remain consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8692e76e-fad4-46ee-9719-3e8390d796fd

📥 Commits

Reviewing files that changed from the base of the PR and between 2463ad4 and 17e331e.

⛔ Files ignored due to path filters (2)
  • tests/regress/expected/primary_key.out is excluded by !**/*.out
  • tests/regress/expected/spill_transaction.out is excluded by !**/*.out
📒 Files selected for processing (7)
  • Makefile
  • include/spock.h
  • src/spock.c
  • src/spock_apply.c
  • tests/regress/regress-postgresql.conf
  • tests/regress/sql/primary_key.sql
  • tests/regress/sql/spill_transaction.sql
✅ Files skipped from review due to trivial changes (3)
  • tests/regress/regress-postgresql.conf
  • tests/regress/sql/primary_key.sql
  • include/spock.h
🚧 Files skipped from review as they are similar to previous changes (2)
  • Makefile
  • src/spock_apply.c

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/spock_apply.c (2)

2999-3046: ⚠️ Potential issue | 🔴 Critical

Snapshot the cleanup decision before dispatch, and run it on both success and error.

Line 1079 now resets ApplyReplayContext inside handle_commit(), so the post-call !entry->from_pq check can dereference a freed in-memory COMMIT entry. Separately, if replication_handler() errors on the current spilled stream entry, cleanup only exists on the success path and the TopMemoryContext copy leaks. Compute a local entry_needs_free before calling the handler and use a local PG_TRY/PG_CATCH so both paths free safely.

Suggested patch
-				bool		entry_spilled = false;
+				bool		entry_needs_free = false;
@@
 				else
 				{
 					entry = apply_replay_queue_next_entry();
 					if (entry == NULL)
 						continue;
+					entry_needs_free = !entry->from_pq;
 					queue_append = false;
 				}
@@
 					if (queue_append)
 					{
 						ApplyReplayEntry *orig = entry;
 
 						entry = apply_replay_queue_append_entry(entry, msg);
-						entry_spilled = (entry != orig);
+						entry_needs_free = (entry != orig);
@@
-						if (entry_spilled)
+						if (entry_needs_free)
 							msg = &entry->copydata;
 					}
@@
-					replication_handler(msg);
+					PG_TRY();
+					{
+						replication_handler(msg);
+					}
+					PG_CATCH();
+					{
+						if (entry_needs_free)
+							apply_replay_entry_free(entry);
+						PG_RE_THROW();
+					}
+					PG_END_TRY();
@@
-					if (entry_spilled || !entry->from_pq)
+					if (entry_needs_free)
 						apply_replay_entry_free(entry);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 2999 - 3046, The cleanup decision for
spilled/in-memory entries must be snapshotted before calling replication_handler
and performed on both success and error; compute a local bool (e.g.
entry_needs_free) from entry_spilled || !entry->from_pq immediately after
apply_replay_queue_append_entry (before any call that may reset
ApplyReplayContext in handle_commit), then wrap the call to
replication_handler(msg) inside a PG_TRY/PG_CATCH and in both the PG_TRY end and
the PG_CATCH ensure apply_replay_entry_free(entry) is called when
entry_needs_free is true; this avoids dereferencing a freed in-memory COMMIT
entry after handle_commit resets ApplyReplayContext and prevents leaking
TopMemoryContext copies on errors.

136-137: ⚠️ Potential issue | 🟠 Major

Widen apply_replay_bytes before the threshold math overflows.

apply_replay_bytes is still an int, but spock.exception_replay_queue_size is allowed up to MAX_KILOBYTES / 1024 in src/spock.c:1155-1167, and 0 now means “unlimited”. The addition at Line 4156 and the accumulation at Line 4214 can therefore overflow long before a valid configured limit, or just wrap indefinitely in the unlimited case. Please move this accounting to Size/uint64 and update the comparison/log formatting consistently.

Also applies to: 4155-4156, 4213-4215

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 136 - 137, The apply_replay_bytes counter is
an int and can overflow/wrap when compared against
spock.exception_replay_queue_size (which can be large or "0" for unlimited);
change apply_replay_bytes to use a wider unsigned type (Size or uint64) and
update all arithmetic and comparisons that reference apply_replay_bytes (notably
in the addition site in the replay-queue enqueue path and the accumulation in
the dequeue/cleanup path where apply_replay_bytes is adjusted) to use the new
type, ensure the unlimited case (0) is handled correctly, and update any
logging/format strings that print apply_replay_bytes to use the appropriate
format specifier for the wider type so comparisons and logs are consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_apply.c`:
- Around line 4154-4205: The spill setup/write can fail after the current
message was consumed but before it is reachable from the in-memory queue, which
would cause replay to start from an incomplete queue; wrap the
BufFileCreateTemp()/apply_replay_spill_write_entry() sequence in an
exception-safe block (PG_TRY/PG_CATCH) so that on any error you either (a) abort
and rethrow (so normal replay is bypassed) or (b) restore state so the current
entry remains reachable (do not set apply_replay_spilling, do not free entry,
and do not change apply_replay_spill_count/apply_replay_bytes) until the spill
completes successfully; only after apply_replay_spill_write_entry() returns
successfully set apply_replay_spilling, update
apply_replay_spill_count/apply_replay_bytes as needed, move/alloc the
ApplyReplayEntry into TopMemoryContext and pfree(entry) and then return mc_entry
— ensuring apply_replay_head and the in-memory queue are not left missing the
current entry on spill failure.

In `@tests/regress/sql/spill_transaction.sql`:
- Around line 115-117: The regression uses spock.exception_log and
spock.resolutions rows from earlier scenarios causing flaky checks; before each
conflict block (or immediately before the SELECTs shown that read
spock.exception_log and spock.resolutions) either TRUNCATE those diagnostic
tables or add a scenario-specific filter (e.g. a WHERE clause referencing a
scenario id/timestamp/command_counter range) and stable ORDER BY clauses so the
SELECTs (the queries reading spock.exception_log and the final SELECT from
spock.resolutions) only return rows for the current conflict case and have
deterministic ordering.

---

Duplicate comments:
In `@src/spock_apply.c`:
- Around line 2999-3046: The cleanup decision for spilled/in-memory entries must
be snapshotted before calling replication_handler and performed on both success
and error; compute a local bool (e.g. entry_needs_free) from entry_spilled ||
!entry->from_pq immediately after apply_replay_queue_append_entry (before any
call that may reset ApplyReplayContext in handle_commit), then wrap the call to
replication_handler(msg) inside a PG_TRY/PG_CATCH and in both the PG_TRY end and
the PG_CATCH ensure apply_replay_entry_free(entry) is called when
entry_needs_free is true; this avoids dereferencing a freed in-memory COMMIT
entry after handle_commit resets ApplyReplayContext and prevents leaking
TopMemoryContext copies on errors.
- Around line 136-137: The apply_replay_bytes counter is an int and can
overflow/wrap when compared against spock.exception_replay_queue_size (which can
be large or "0" for unlimited); change apply_replay_bytes to use a wider
unsigned type (Size or uint64) and update all arithmetic and comparisons that
reference apply_replay_bytes (notably in the addition site in the replay-queue
enqueue path and the accumulation in the dequeue/cleanup path where
apply_replay_bytes is adjusted) to use the new type, ensure the unlimited case
(0) is handled correctly, and update any logging/format strings that print
apply_replay_bytes to use the appropriate format specifier for the wider type so
comparisons and logs are consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 17eafc4e-e099-4b10-a4f7-3a4d4dc49f26

📥 Commits

Reviewing files that changed from the base of the PR and between 17e331e and 0d610ea.

⛔ Files ignored due to path filters (1)
  • tests/regress/expected/spill_transaction.out is excluded by !**/*.out
📒 Files selected for processing (3)
  • src/spock.c
  • src/spock_apply.c
  • tests/regress/sql/spill_transaction.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/spock.c

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/spock_apply.c (2)

2957-2960: ⚠️ Potential issue | 🟠 Major

Free TopMemoryContext replay entries on the exception path too.

The cleanup only runs after replication_handler(msg) returns. If that call throws on the current spilled/replayed entry, the struct is already outside ApplyReplayContext, so apply_replay_queue_reset() will not reclaim it and the payload stays resident for the rest of the worker lifetime.

Suggested fix
-				bool		entry_spilled = false;
+				bool		entry_needs_free = false;
@@
 				else
 				{
 					entry = apply_replay_queue_next_entry();
 					if (entry == NULL)
 						continue;
+					entry_needs_free = !entry->from_pq;
 					queue_append = false;
 				}
@@
 						ApplyReplayEntry *orig = entry;
 
 						entry = apply_replay_queue_append_entry(entry, msg);
-						entry_spilled = (entry != orig);
+						entry_needs_free = (entry != orig);
@@
-						if (entry_spilled)
+						if (entry_needs_free)
 							msg = &entry->copydata;
 					}
@@
-					replication_handler(msg);
+					PG_TRY();
+					{
+						replication_handler(msg);
+					}
+					PG_CATCH();
+					{
+						if (entry_needs_free)
+							apply_replay_entry_free(entry);
+						PG_RE_THROW();
+					}
+					PG_END_TRY();
@@
-					if (entry_spilled || !entry->from_pq)
+					if (entry_needs_free)
 						apply_replay_entry_free(entry);

Also applies to: 3002-3046

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 2957 - 2960, The current code takes an entry
from apply_replay_queue_next_entry() and processes it with
replication_handler(msg) but if replication_handler throws the spilled/replayed
entry remains allocated in TopMemoryContext because apply_replay_queue_reset()
only runs after normal return; add cleanup in the exception path to free or
reset the current replay entry and restore TopMemoryContext so the payload is
reclaimed. Concretely, wrap the replication_handler(msg) call with error
handling that, on exception, calls the same cleanup that
apply_replay_queue_reset() would for the active entry (e.g., free/release the
entry returned by apply_replay_queue_next_entry() and reset or switch out of
ApplyReplayContext/TopMemoryContext) — apply this change for both places where
an entry is pulled and processed (the block using apply_replay_queue_next_entry
and the similar block later around replication_handler).

4161-4206: ⚠️ Potential issue | 🔴 Critical

Don’t enter replay from a spill transition that never made the current record durable.

By the time Line 4190 runs, the current w record is already consumed from libpq, but it is neither linked into apply_replay_head nor guaranteed to be replayable until the create/write/relocate sequence completes. Any ERROR in that window sends the outer catch into apply_replay_queue_start_replay() with an incomplete queue, so the worker resumes at the next stream record and can silently drop the current change. Treat the whole spill-transition sequence as one exception-safe unit: either keep the current entry reachable until it succeeds, or rethrow and force a worker restart instead of normal replay.

Also applies to: 3233-3265

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 4161 - 4206, The spill-transition can fail
after the libpq record is consumed but before it is safely persisted, so either
preserve the current entry until the spill completes or force a restart on
error; to fix, allocate/copy the ApplyReplayEntry into TopMemoryContext (create
mc_entry via MemoryContextSwitchTo(TopMemoryContext) + memcpy of entry) before
calling apply_replay_spill_write_entry / creating apply_replay_spill_file, then
call apply_replay_spill_write_entry(msg->len, msg->data) and only pfree(entry)
after the write succeeds; alternatively wrap the entire spill sequence
(BufFileCreateTemp, apply_replay_spill_write_entry, memcpy/pfree) in a
PG_TRY/PG_CATCH and rethrow on error so apply_replay_queue_start_replay cannot
run with an incomplete queue — update code paths referencing
apply_replay_spilling, apply_replay_spill_file, apply_replay_spill_write_entry,
ApplyReplayEntry, apply_replay_head, and entry accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/spock_apply.c`:
- Around line 2957-2960: The current code takes an entry from
apply_replay_queue_next_entry() and processes it with replication_handler(msg)
but if replication_handler throws the spilled/replayed entry remains allocated
in TopMemoryContext because apply_replay_queue_reset() only runs after normal
return; add cleanup in the exception path to free or reset the current replay
entry and restore TopMemoryContext so the payload is reclaimed. Concretely, wrap
the replication_handler(msg) call with error handling that, on exception, calls
the same cleanup that apply_replay_queue_reset() would for the active entry
(e.g., free/release the entry returned by apply_replay_queue_next_entry() and
reset or switch out of ApplyReplayContext/TopMemoryContext) — apply this change
for both places where an entry is pulled and processed (the block using
apply_replay_queue_next_entry and the similar block later around
replication_handler).
- Around line 4161-4206: The spill-transition can fail after the libpq record is
consumed but before it is safely persisted, so either preserve the current entry
until the spill completes or force a restart on error; to fix, allocate/copy the
ApplyReplayEntry into TopMemoryContext (create mc_entry via
MemoryContextSwitchTo(TopMemoryContext) + memcpy of entry) before calling
apply_replay_spill_write_entry / creating apply_replay_spill_file, then call
apply_replay_spill_write_entry(msg->len, msg->data) and only pfree(entry) after
the write succeeds; alternatively wrap the entire spill sequence
(BufFileCreateTemp, apply_replay_spill_write_entry, memcpy/pfree) in a
PG_TRY/PG_CATCH and rethrow on error so apply_replay_queue_start_replay cannot
run with an incomplete queue — update code paths referencing
apply_replay_spilling, apply_replay_spill_file, apply_replay_spill_write_entry,
ApplyReplayEntry, apply_replay_head, and entry accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b3d2c197-63c7-4516-b236-70e1ce7a0e2f

📥 Commits

Reviewing files that changed from the base of the PR and between 17e331e and 8a92161.

⛔ Files ignored due to path filters (1)
  • tests/regress/expected/spill_transaction.out is excluded by !**/*.out
📒 Files selected for processing (3)
  • src/spock.c
  • src/spock_apply.c
  • tests/regress/sql/spill_transaction.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/spock.c

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/spock_apply.c (2)

137-137: Consider widening apply_replay_bytes to Size for consistency.

While the current int type works given the spill threshold check, using Size (or int64) would align with PostgreSQL's memory accounting conventions and provide defensive headroom. The variable accumulates msg->len values until spilling activates, so its maximum is bounded by spock_replay_queue_size plus one message.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` at line 137, Change the type of apply_replay_bytes from
int to Size to match PostgreSQL memory accounting and avoid overflow when
accumulating msg->len values; update its declaration (apply_replay_bytes) to
"Size", ensure any comparisons against spock_replay_queue_size and additions of
msg->len use the Size type (cast msg->len if necessary), and adjust any related
format/print or helper calls that assume an int to accept Size values.

4109-4111: Temporary debug logging is acceptable for initial release.

The XXX comments indicate awareness that these DEBUG1 statements are for validation during the stabilization period. The conditional logging every 100 actions helps reduce volume while still providing observability into spill behavior.

Consider creating a follow-up task to demote these to DEBUG2 or remove them once the spill-to-disk feature is proven stable in production.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 4109 - 4111, The temporary DEBUG1 logging
around the spill-to-disk replay (the elog(DEBUG1, "SPOCK: replay queue next
entry from memory: len=%d", entry->copydata.len) site in spock_apply.c) should
be tracked for future demotion/removal; create a follow-up task (ticket or TODO)
that records: verify spill-to-disk stability in production, then either demote
this elog to DEBUG2 or remove it, and update the source comment (replace the XXX
with the ticket ID and a short note) so reviewers can find and close the task
later.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/spock_apply.c`:
- Line 137: Change the type of apply_replay_bytes from int to Size to match
PostgreSQL memory accounting and avoid overflow when accumulating msg->len
values; update its declaration (apply_replay_bytes) to "Size", ensure any
comparisons against spock_replay_queue_size and additions of msg->len use the
Size type (cast msg->len if necessary), and adjust any related format/print or
helper calls that assume an int to accept Size values.
- Around line 4109-4111: The temporary DEBUG1 logging around the spill-to-disk
replay (the elog(DEBUG1, "SPOCK: replay queue next entry from memory: len=%d",
entry->copydata.len) site in spock_apply.c) should be tracked for future
demotion/removal; create a follow-up task (ticket or TODO) that records: verify
spill-to-disk stability in production, then either demote this elog to DEBUG2 or
remove it, and update the source comment (replace the XXX with the ticket ID and
a short note) so reviewers can find and close the task later.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a0d91ac5-2bbc-4e68-b51a-338b4604f66d

📥 Commits

Reviewing files that changed from the base of the PR and between 17e331e and 8a92161.

⛔ Files ignored due to path filters (1)
  • tests/regress/expected/spill_transaction.out is excluded by !**/*.out
📒 Files selected for processing (3)
  • src/spock.c
  • src/spock_apply.c
  • tests/regress/sql/spill_transaction.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/spock.c

@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Mar 31, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 duplication

Metric Results
Duplication 0

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/spock_apply.c (2)

2911-2914: ⚠️ Potential issue | 🟠 Major

Free spill-backed entries when replication_handler() throws.

entry_spilled / spill-read entries are only released on the success path. If replication_handler() raises before Line 3060, the TopMemoryContext copy becomes unreachable; on the initial exception path the worker keeps running into replay mode, so each failed large transaction can leak the very buffer this spill path was meant to cap.

Patch
-				bool		entry_spilled = false;
+				bool		entry_needs_free = false;
@@
 				else
 				{
 					entry = apply_replay_queue_next_entry();
 					if (entry == NULL)
 						continue;
+					entry_needs_free = !entry->from_pq;
 					queue_append = false;
 				}
@@
 						ApplyReplayEntry *orig = entry;
 
 						entry = apply_replay_queue_append_entry(entry, msg);
-						entry_spilled = (entry != orig);
+						entry_needs_free = (entry != orig);
@@
-						if (entry_spilled)
+						if (entry_needs_free)
 							msg = &entry->copydata;
 					}
@@
-					replication_handler(msg);
+					PG_TRY();
+					{
+						replication_handler(msg);
+					}
+					PG_CATCH();
+					{
+						if (entry_needs_free)
+							apply_replay_entry_free(entry);
+						PG_RE_THROW();
+					}
+					PG_END_TRY();
@@
-					if (entry_spilled || !entry->from_pq)
+					if (entry_needs_free)
 						apply_replay_entry_free(entry);

Also applies to: 3014-3028, 3046-3061

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 2911 - 2914, The code currently only frees
spill-backed ApplyReplayEntry objects (tracked by entry_spilled) on the success
path, so if replication_handler() throws the spilled buffer remains leaked;
update the control flow around replication_handler() in the replay loop to
always release any spill-backed resources on error (use PG_TRY/PG_CATCH or a
cleanup block) — ensure that when replication_handler() raises you check
entry_spilled and free the spill-read contents and msg, and free or pfree the
ApplyReplayEntry copy that was allocated in TopMemoryContext before rethrowing
or switching to the replay path; locate the variables ApplyReplayEntry,
entry_spilled, replication_handler(), and msg to add the cleanup so both the
normal and exception paths free the spill-backed entries.

136-137: ⚠️ Potential issue | 🟡 Minor

Use a wider type for apply_replay_bytes.

The GUC allows thresholds up to roughly 2GB, so apply_replay_bytes + msg->len can wrap int before the spill check runs. That can postpone spilling on exactly the large transactions this feature targets.

Patch
-static int	apply_replay_bytes = 0;
+static Size	apply_replay_bytes = 0;
@@
-	if (!apply_replay_spilling && spock_replay_queue_size > 0 &&
-		apply_replay_bytes + msg->len > spock_replay_queue_size * 1024L * 1024L)
+	if (!apply_replay_spilling && spock_replay_queue_size > 0 &&
+		apply_replay_bytes + (Size) msg->len >
+			(Size) spock_replay_queue_size * 1024 * 1024)
@@
-			 "in-memory %d bytes exceeds %d MB limit",
+			 "in-memory %zu bytes exceeds %d MB limit",
@@
-		apply_replay_bytes += msg->len;
+		apply_replay_bytes += (Size) msg->len;

Also applies to: 4176-4195, 4229-4230

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 136 - 137, The variable apply_replay_bytes
must be widened to a 64-bit type to avoid integer wrap when adding large
msg->len values; change the declaration of apply_replay_bytes from int to a
64-bit integer (e.g. int64_t/long long/pg_int64 consistent with project types),
update all other occurrences that assume it is an int (including the other
declarations/usages referenced around apply_replay_bytes and the spots that add
msg->len or compare against the GUC threshold), ensure initializations and
comparisons use the new type (and add casts where msg->len is smaller) so the
spill check uses 64-bit arithmetic throughout.
🧹 Nitpick comments (1)
tests/regress/sql/primary_key.sql (1)

134-142: Scope this wait to test_subscription.

This loop still polls the unqualified spock.sub_show_status() form. Since the same script already uses test_subscription at Line 171, using the named overload here too keeps the scalar subquery deterministic if another subscription exists.

Patch
-  WHILE (SELECT status FROM spock.sub_show_status()) <> 'disabled' LOOP
+  WHILE (SELECT status FROM spock.sub_show_status('test_subscription')) <> 'disabled' LOOP
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/regress/sql/primary_key.sql` around lines 134 - 142, The polling loop
uses the unqualified spock.sub_show_status() which can return non-deterministic
results if multiple subscriptions exist; change the scalar subquery to call the
named overload with the test_subscription identifier (i.e., use
spock.sub_show_status(test_subscription)) so the WHILE condition becomes
deterministic and consistent with the later usage of test_subscription.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/spock_apply.c`:
- Around line 2911-2914: The code currently only frees spill-backed
ApplyReplayEntry objects (tracked by entry_spilled) on the success path, so if
replication_handler() throws the spilled buffer remains leaked; update the
control flow around replication_handler() in the replay loop to always release
any spill-backed resources on error (use PG_TRY/PG_CATCH or a cleanup block) —
ensure that when replication_handler() raises you check entry_spilled and free
the spill-read contents and msg, and free or pfree the ApplyReplayEntry copy
that was allocated in TopMemoryContext before rethrowing or switching to the
replay path; locate the variables ApplyReplayEntry, entry_spilled,
replication_handler(), and msg to add the cleanup so both the normal and
exception paths free the spill-backed entries.
- Around line 136-137: The variable apply_replay_bytes must be widened to a
64-bit type to avoid integer wrap when adding large msg->len values; change the
declaration of apply_replay_bytes from int to a 64-bit integer (e.g.
int64_t/long long/pg_int64 consistent with project types), update all other
occurrences that assume it is an int (including the other declarations/usages
referenced around apply_replay_bytes and the spots that add msg->len or compare
against the GUC threshold), ensure initializations and comparisons use the new
type (and add casts where msg->len is smaller) so the spill check uses 64-bit
arithmetic throughout.

---

Nitpick comments:
In `@tests/regress/sql/primary_key.sql`:
- Around line 134-142: The polling loop uses the unqualified
spock.sub_show_status() which can return non-deterministic results if multiple
subscriptions exist; change the scalar subquery to call the named overload with
the test_subscription identifier (i.e., use
spock.sub_show_status(test_subscription)) so the WHILE condition becomes
deterministic and consistent with the later usage of test_subscription.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6dd04287-f0de-412b-ac79-5ff618505ec7

📥 Commits

Reviewing files that changed from the base of the PR and between 17e331e and 80602f2.

⛔ Files ignored due to path filters (2)
  • tests/regress/expected/primary_key.out is excluded by !**/*.out
  • tests/regress/expected/spill_transaction.out is excluded by !**/*.out
📒 Files selected for processing (7)
  • Makefile
  • include/spock.h
  • src/spock.c
  • src/spock_apply.c
  • tests/regress/regress-postgresql.conf
  • tests/regress/sql/primary_key.sql
  • tests/regress/sql/spill_transaction.sql
✅ Files skipped from review due to trivial changes (2)
  • tests/regress/regress-postgresql.conf
  • include/spock.h
🚧 Files skipped from review as they are similar to previous changes (1)
  • Makefile

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.github/workflows/spockbench.yml (1)

50-54: ulimit -c unlimited is a no-op in this step.

That limit only affects the current shell and its children, and this step does not start any long-lived workload after setting it. It won't carry into the later docker compose up / docker run steps, so I'd either drop it or move it into the same step as the process that should inherit it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/spockbench.yml around lines 50 - 54, The "Enable core
dumps on host" step sets "ulimit -c unlimited" which only affects that shell and
not later steps (so it's a no-op for the containers); either remove the ulimit
line or move it into the same step that launches the processes that need core
dumps (e.g., the step that runs "docker compose up" / "docker run"), or
alternatively use Docker's ulimit support (pass --ulimit core=-1 to docker run
or the ulimits section in docker-compose) so the started containers actually
inherit an unlimited core size; update the workflow accordingly referencing the
step named "Enable core dumps on host" and the "ulimit -c unlimited" command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/spockbench.yml:
- Around line 169-177: docker exec on stopped containers (using "$REG_CT_NAME"
and "$TAP_CT_NAME") fails silently and prevents extraction of core backtraces;
instead, run a new temporary container that mounts the host core directories and
workspace and runs the gdb extraction. Replace the docker exec blocks that
reference /tmp/regress-cores and /tmp/tap-cores with docker run --rm mounting
/tmp/regress-cores:/cores (or /tmp/tap-cores for the TAP step) and
"${GITHUB_WORKSPACE}":/workspace, use the same image used elsewhere for Spock
(the SPOCK image variable used in this workflow), and run the existing bash -c
block there to produce backtraces.txt in the workspace; keep the gdb commands
and output redirection (to
${GITHUB_WORKSPACE}/tests/regress/regression_output/backtraces.txt and the TAP
equivalent) but remove the || true that hides failures.

---

Nitpick comments:
In @.github/workflows/spockbench.yml:
- Around line 50-54: The "Enable core dumps on host" step sets "ulimit -c
unlimited" which only affects that shell and not later steps (so it's a no-op
for the containers); either remove the ulimit line or move it into the same step
that launches the processes that need core dumps (e.g., the step that runs
"docker compose up" / "docker run"), or alternatively use Docker's ulimit
support (pass --ulimit core=-1 to docker run or the ulimits section in
docker-compose) so the started containers actually inherit an unlimited core
size; update the workflow accordingly referencing the step named "Enable core
dumps on host" and the "ulimit -c unlimited" command.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 65001453-9c04-44ac-a07e-80134244f8bf

📥 Commits

Reviewing files that changed from the base of the PR and between ef7887a and ef6c6a4.

📒 Files selected for processing (2)
  • .github/workflows/spockbench.yml
  • tests/regress/sql/spill_transaction.sql

@danolivo danolivo force-pushed the spoc-482 branch 2 times, most recently from 9df3149 to 67eab35 Compare April 1, 2026 11:19
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/spock_apply.c (2)

4257-4262: Missing error check for BufFileSeek.

BufFileSeek() returns 0 on success and non-zero on failure. If the seek fails (e.g., file corruption), subsequent reads will return incorrect data. Consider checking the return value:

🛡️ Suggested fix
 	if (apply_replay_spilling)
 	{
+		int seek_result;
+
 		Assert(apply_replay_spill_file != NULL);
-		BufFileSeek(apply_replay_spill_file, 0, 0, SEEK_SET);
+		seek_result = BufFileSeek(apply_replay_spill_file, 0, 0, SEEK_SET);
+		if (seek_result != 0)
+			elog(ERROR, "SPOCK %s: failed to seek replay spill file to start",
+				 MySubscription->name);
 		apply_replay_spill_read = 0;
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 4257 - 4262, The BufFileSeek call inside the
apply_replay_spilling block (when apply_replay_spilling is true and
apply_replay_spill_file is asserted) currently ignores the return value; update
the code around BufFileSeek(apply_replay_spill_file, 0, 0, SEEK_SET) to capture
its return, check for non-zero (failure), and handle the error (e.g., log the
failure via ereport/elog or processLogger equivalent and abort/return an error
path) before proceeding to set apply_replay_spill_read = 0; ensure you reference
apply_replay_spill_file, BufFileSeek, and apply_replay_spill_read when
implementing the check so failed seeks cannot lead to invalid reads.

3044-3047: Entry freeing on success path looks correct, but error path may leak.

The need_free logic correctly frees spill-read entries after replication_handler() succeeds. However, if replication_handler() throws an exception, the current entry won't be freed before the error propagates to the outer PG_CATCH. For spilled entries in TopMemoryContext, this leaks memory until worker restart.

Consider wrapping the call in a local PG_TRY/PG_CATCH to free the entry on error:

🛡️ Suggested fix
-					replication_handler(msg);
+					PG_TRY();
+					{
+						replication_handler(msg);
+					}
+					PG_CATCH();
+					{
+						if (need_free)
+							apply_replay_entry_free(entry);
+						PG_RE_THROW();
+					}
+					PG_END_TRY();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 3044 - 3047, The call to
replication_handler(entry) can throw and leak spilled entries when need_free is
true because apply_replay_entry_free(entry) is only called on the success path;
wrap the replication_handler(...) invocation in a local PG_TRY / PG_CATCH block
so that in the PG_CATCH path you call apply_replay_entry_free(entry) when
need_free is true (and then rethrow with PG_RE_THROW), ensuring spilled
TopMemoryContext entries are freed even on exceptions; reference
replication_handler, need_free, apply_replay_entry_free, entry, PG_TRY/PG_CATCH
and PG_RE_THROW when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/spock_apply.c`:
- Around line 4257-4262: The BufFileSeek call inside the apply_replay_spilling
block (when apply_replay_spilling is true and apply_replay_spill_file is
asserted) currently ignores the return value; update the code around
BufFileSeek(apply_replay_spill_file, 0, 0, SEEK_SET) to capture its return,
check for non-zero (failure), and handle the error (e.g., log the failure via
ereport/elog or processLogger equivalent and abort/return an error path) before
proceeding to set apply_replay_spill_read = 0; ensure you reference
apply_replay_spill_file, BufFileSeek, and apply_replay_spill_read when
implementing the check so failed seeks cannot lead to invalid reads.
- Around line 3044-3047: The call to replication_handler(entry) can throw and
leak spilled entries when need_free is true because
apply_replay_entry_free(entry) is only called on the success path; wrap the
replication_handler(...) invocation in a local PG_TRY / PG_CATCH block so that
in the PG_CATCH path you call apply_replay_entry_free(entry) when need_free is
true (and then rethrow with PG_RE_THROW), ensuring spilled TopMemoryContext
entries are freed even on exceptions; reference replication_handler, need_free,
apply_replay_entry_free, entry, PG_TRY/PG_CATCH and PG_RE_THROW when making the
change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dc8bbaff-e757-4355-a9c5-88f4b0600d93

📥 Commits

Reviewing files that changed from the base of the PR and between 9df3149 and 67eab35.

⛔ Files ignored due to path filters (1)
  • tests/regress/expected/spill_transaction.out is excluded by !**/*.out
📒 Files selected for processing (2)
  • src/spock_apply.c
  • tests/regress/sql/spill_transaction.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/regress/sql/spill_transaction.sql

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_apply.c`:
- Around line 4256-4263: The seek of the spill file using
BufFileSeek(apply_replay_spill_file, 0, 0, SEEK_SET) is not checking
BufFileSeek's return value (non-zero indicates error) so a failed seek can cause
incorrect reads and silent corruption; update the block that touches
apply_replay_spill_file to check BufFileSeek's return, and on non-zero handle
the error (e.g., ereport(ERROR, ...) or appropriate cleanup/exit) before
proceeding and only set apply_replay_spill_read = 0 when the seek succeeded to
avoid reading from an unknown position.

In `@tests/regress/sql/spill_transaction.sql`:
- Around line 155-171: The sync_event() call is executed on the subscriber
connection but must be called on the provider to capture the provider WAL LSN
after TRUNCATE; update the script so that after TRUNCATE test_spill and before
switching to the subscriber connection you call SELECT spock.sync_event() AS
sync_lsn (on the provider), set the :sync_lsn variable, then switch to the
subscriber and call CALL spock.wait_for_sync_event(NULL, 'test_provider',
:'sync_lsn', 30); also remove the redundant "\c :subscriber_dsn" that
re-connects to the subscriber when already on it; ensure the sequence around
TRUNCATE, SELECT spock.sync_event(), connection switch, and CALL
spock.wait_for_sync_event() follows that order.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dc0f1d6c-1ecf-4594-baff-a5c6d25a8567

📥 Commits

Reviewing files that changed from the base of the PR and between 9df3149 and 67eab35.

⛔ Files ignored due to path filters (1)
  • tests/regress/expected/spill_transaction.out is excluded by !**/*.out
📒 Files selected for processing (2)
  • src/spock_apply.c
  • tests/regress/sql/spill_transaction.sql

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/spock_apply.c (2)

4168-4169: ⚠️ Potential issue | 🟡 Minor

Integer overflow still possible in apply_replay_bytes check.

The comparison apply_replay_bytes + msg->len > spock_replay_queue_size * 1024L * 1024L performs the addition on int types before comparing to long. If apply_replay_bytes is near INT_MAX and msg->len is non-trivial, the addition could overflow to negative, making the check always false and never triggering spilling.

While individual messages are bounded by MaxAllocSize, repeated accumulation without spilling could theoretically reach this limit on systems with large spock_replay_queue_size (max is MAX_KILOBYTES/1024 MB per the GUC, which is ~2TB).

🛡️ Proposed fix
-	if (!apply_replay_spilling && spock_replay_queue_size > 0 &&
-		apply_replay_bytes + msg->len > spock_replay_queue_size * 1024L * 1024L)
+	if (!apply_replay_spilling && spock_replay_queue_size > 0 &&
+		(Size)apply_replay_bytes + (Size)msg->len > (Size)spock_replay_queue_size * 1024L * 1024L)

Or change apply_replay_bytes to Size as suggested in the past review.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 4168 - 4169, The check using
apply_replay_bytes + msg->len can overflow because the addition happens in an
int-sized type; change the logic to perform the comparison in a sufficiently
wide unsigned/Size type (or change apply_replay_bytes to Size) so the addition
cannot wrap. Concretely, update the condition that references
apply_replay_spilling, apply_replay_bytes, msg->len and spock_replay_queue_size
to cast operands to Size (or use a Size-typed apply_replay_bytes) and compute
target_bytes = (Size) spock_replay_queue_size * 1024 * 1024, then compare
(apply_replay_bytes + (Size) msg->len) > target_bytes to avoid integer overflow
and ensure correct spilling behavior.

3044-3047: ⚠️ Potential issue | 🟡 Minor

Entry leak on exception from replication_handler().

If replication_handler(msg) throws an exception, apply_replay_entry_free(entry) is never called for entries where need_free is true. These are TopMemoryContext-allocated entries that survive the outer PG_CATCH cleanup. Over multiple retried exceptions in a large transaction, this could accumulate significant memory.

Consider wrapping the call in a PG_TRY block:

🛡️ Proposed fix
 					replication_handler(msg);
 
-					if (need_free)
-						apply_replay_entry_free(entry);
+					PG_TRY();
+					{
+						/* Entry already processed by replication_handler */
+					}
+					PG_FINALLY();
+					{
+						if (need_free)
+							apply_replay_entry_free(entry);
+					}
+					PG_END_TRY();

Or simpler, capture entry before the call and free in the outer PG_CATCH.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 3044 - 3047, replication_handler(msg) can
throw and skip freeing TopMemoryContext-allocated entries, leaking memory; wrap
the call so entry is always freed when need_free is true by either enclosing
replication_handler(msg) in a PG_TRY/PG_CATCH and calling
apply_replay_entry_free(entry) in the CATCH, or capture the entry before the
call and ensure apply_replay_entry_free(entry) is invoked in the outer PG_CATCH;
reference replication_handler, apply_replay_entry_free, need_free and entry so
the fix locates the right spot and guarantees cleanup even on exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_apply.c`:
- Around line 4176-4179: The creation of the temp spill file via
BufFileCreateTemp() can throw an ERROR and leave the current replay entry
consumed but not enqueued or spilled, so if that happens the outer PG_CATCH will
call apply_replay_queue_start_replay() and replay from an incomplete queue;
update the code around BufFileCreateTemp()/apply_replay_spill_file and the
apply_replay_spilling flag to document this limitation with a brief comment
explaining the failure mode (BufFileCreateTemp may ERROR on disk/perm issues),
the resulting state (entry consumed but not persisted), and the expected
behavior (worker restart/outer PG_CATCH will handle it), referencing
BufFileCreateTemp, apply_replay_spill_file, apply_replay_spilling, PG_CATCH, and
apply_replay_queue_start_replay in the comment.

---

Duplicate comments:
In `@src/spock_apply.c`:
- Around line 4168-4169: The check using apply_replay_bytes + msg->len can
overflow because the addition happens in an int-sized type; change the logic to
perform the comparison in a sufficiently wide unsigned/Size type (or change
apply_replay_bytes to Size) so the addition cannot wrap. Concretely, update the
condition that references apply_replay_spilling, apply_replay_bytes, msg->len
and spock_replay_queue_size to cast operands to Size (or use a Size-typed
apply_replay_bytes) and compute target_bytes = (Size) spock_replay_queue_size *
1024 * 1024, then compare (apply_replay_bytes + (Size) msg->len) > target_bytes
to avoid integer overflow and ensure correct spilling behavior.
- Around line 3044-3047: replication_handler(msg) can throw and skip freeing
TopMemoryContext-allocated entries, leaking memory; wrap the call so entry is
always freed when need_free is true by either enclosing replication_handler(msg)
in a PG_TRY/PG_CATCH and calling apply_replay_entry_free(entry) in the CATCH, or
capture the entry before the call and ensure apply_replay_entry_free(entry) is
invoked in the outer PG_CATCH; reference replication_handler,
apply_replay_entry_free, need_free and entry so the fix locates the right spot
and guarantees cleanup even on exceptions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3189cb4b-047d-46a7-8778-90535c8da725

📥 Commits

Reviewing files that changed from the base of the PR and between 9df3149 and deb5299.

⛔ Files ignored due to path filters (1)
  • tests/regress/expected/spill_transaction.out is excluded by !**/*.out
📒 Files selected for processing (2)
  • src/spock_apply.c
  • tests/regress/sql/spill_transaction.sql
✅ Files skipped from review due to trivial changes (1)
  • tests/regress/sql/spill_transaction.sql

danolivo added 3 commits April 3, 2026 10:08
When the replay queue exceeds spock.exception_replay_queue_size (in MB),
subsequent entries are written to a temporary BufFile on disk instead of
accumulating in memory.  This prevents OOM on the subscriber when
replaying large transactions with exception handling enabled.

Key design decisions:

- In-memory queue entries (ApplyReplayContext) hold libpq-allocated data
  buffers.  apply_replay_queue_reset walks the list to PQfreemem them
  before calling MemoryContextReset.

- Spilled entries are relocated to TopMemoryContext so they survive the
  MemoryContextReset in handle_commit.  The caller frees them explicitly
  after replication_handler returns.

- When the replay queue is exhausted mid-transaction (exception before
  COMMIT was received), the loop seamlessly transitions to stream reading
  via continue instead of break, matching the pre-spill behavior.

- Added from_pq field to ApplyReplayEntry to distinguish libpq-allocated
  buffers (PQfreemem) from palloc'd spill-read buffers (pfree).

Includes regression tests covering all three exception_behaviour modes
(transdiscard, discard, sub_disable) both without conflict and with a
primary key conflict on the last record.
- Fix memory leak: spill-read entries during replay were allocated in
  TopMemoryContext but never freed because entry_spilled was only set on
  the first-pass path. Add !entry->from_pq to the free condition so
  replay-path spill-read entries are freed after processing.

- Set GUC minimum for spock.exception_replay_queue_size from -1 to 0,
  removing an undocumented value that silently behaved the same as 0.

- Use explicit subscription name in spill_transaction test when polling
  sub_show_status to avoid ambiguity with multiple subscriptions.
Refactor apply_replay_queue_append_entry() to return a bool indicating
whether the caller must free the entry, instead of requiring the caller
to compare pointers and track an entry_spilled flag. The function now
updates entry and msg pointers through out-parameters.

This fixes a use-after-free: on a COMMIT message, replication_handler()
calls handle_commit → apply_replay_queue_reset → MemoryContextReset,
which destroys in-memory entries. The old code then accessed
entry->from_pq on the freed memory. The new design captures the free
decision before replication_handler runs.

Also update spill_transaction regression test: move TRUNCATE before
subscriber configuration in SUB_DISABLE section to avoid replicating
TRUNCATE while exception_behaviour is active, drop command_counter
from exception_log queries for stable output, and add sanity checks.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/regress/sql/spill_transaction.sql (1)

13-19: Start this script from a synced replication position.

This file begins by truncating local diagnostic tables and flipping subscriber-wide GUCs, but it never first drains any tail apply work from the previous regression. A quick sync_event() / wait_for_sync_event() barrier here would make the first scenario deterministic and prevent late inserts from repopulating spock.exception_log after the truncate.

Suggested barrier
+ \c :provider_dsn
+ SELECT spock.sync_event() as sync_lsn
+ \gset
+
  \c :subscriber_dsn
+ CALL spock.wait_for_sync_event(NULL, 'test_provider', :'sync_lsn', 30);
  TRUNCATE spock.resolutions;
  TRUNCATE spock.exception_log;
  ALTER SYSTEM SET spock.save_resolutions = on;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/regress/sql/spill_transaction.sql` around lines 13 - 19, Before
truncating local diagnostic tables, add a replication drain barrier by calling
sync_event() and then wait_for_sync_event() to ensure the subscriber has applied
all prior tail work; place this barrier immediately before the TRUNCATE
spock.resolutions; TRUNCATE spock.exception_log; and GUC changes so late apply
inserts cannot repopulate spock.exception_log after the truncate. Use the
existing sync_event()/wait_for_sync_event() helpers to create a deterministic
start-of-test state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/regress/sql/spill_transaction.sql`:
- Around line 24-26: The test is triggering a collision on the unique index
payload_idx instead of the primary-key; remove the secondary unique index
(payload_idx) from the CREATE TABLE block, pre-insert a local row with id =
50000 and a non-conflicting payload value (e.g. payload=1) into table test_spill
before running the provider rows so the conflict occurs on the PK, and then
adjust the DISCARD-mode comment and the aggregate/result expectations to reflect
that the last provider row is the one skipped and the preinserted local PK row
is preserved; reference the table name test_spill, the index payload_idx, and
the id value 50000 when making these edits.

---

Nitpick comments:
In `@tests/regress/sql/spill_transaction.sql`:
- Around line 13-19: Before truncating local diagnostic tables, add a
replication drain barrier by calling sync_event() and then wait_for_sync_event()
to ensure the subscriber has applied all prior tail work; place this barrier
immediately before the TRUNCATE spock.resolutions; TRUNCATE spock.exception_log;
and GUC changes so late apply inserts cannot repopulate spock.exception_log
after the truncate. Use the existing sync_event()/wait_for_sync_event() helpers
to create a deterministic start-of-test state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 08bbf23a-27a8-4c5c-a00e-0ac4b84c60bd

📥 Commits

Reviewing files that changed from the base of the PR and between 4102871 and e0953d1.

⛔ Files ignored due to path filters (2)
  • tests/regress/expected/primary_key.out is excluded by !**/*.out
  • tests/regress/expected/spill_transaction.out is excluded by !**/*.out
📒 Files selected for processing (7)
  • Makefile
  • include/spock.h
  • src/spock.c
  • src/spock_apply.c
  • tests/regress/regress-postgresql.conf
  • tests/regress/sql/primary_key.sql
  • tests/regress/sql/spill_transaction.sql
✅ Files skipped from review due to trivial changes (3)
  • include/spock.h
  • tests/regress/regress-postgresql.conf
  • tests/regress/sql/primary_key.sql
🚧 Files skipped from review as they are similar to previous changes (3)
  • Makefile
  • src/spock.c
  • src/spock_apply.c

@mason-sharp
Copy link
Copy Markdown
Member

mason-sharp commented Apr 3, 2026

Replacing my previous message with an updated patch.

EDIT: updated description, it was not accurate.

queue_memory_usage2.patch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants