Conversation
📝 WalkthroughWalkthroughReintroduces configurable in-memory replay-queue semantics with on-disk spill support, implements BufFile-based spill/read and explicit replay-mode state in the apply path, updates a GUC and an exported variable comment, and adds a new regression test plus related test-config and test SQL changes. Changes
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/spock_apply.c (1)
2955-3044:⚠️ Potential issue | 🔴 CriticalRun spill-entry cleanup in both replay and error paths.
apply_replay_queue_next_entry()can return a TopMemoryContext entry fromapply_replay_spill_read_entry(), but the free here only keys offentry_spilled, which is set only for first-pass stream entries. Replay entries from disk therefore survive both this block andapply_replay_queue_reset(), and a failing first-pass spilled entry also leaks becausereplication_handler()throws before Line 3043. That keeps spill-backed buffers alive until worker exit and can reintroduce the memory pressure this feature is supposed to remove.♻️ Suggested fix
- replication_handler(msg); - - /* - * Free spilled entries explicitly: their structs live in - * TopMemoryContext (not ApplyReplayContext), so they are - * not cleaned up by apply_replay_queue_reset. - * - * In-memory entries (both first-pass and replay) live in - * ApplyReplayContext and are freed by - * MemoryContextReset inside apply_replay_queue_reset, - * called from handle_commit. - */ - if (entry_spilled) - apply_replay_entry_free(entry); + PG_TRY(); + { + replication_handler(msg); + } + PG_CATCH(); + { + if (entry_spilled || (!queue_append && !entry->from_pq)) + apply_replay_entry_free(entry); + PG_RE_THROW(); + } + PG_END_TRY(); + + /* + * Free entries whose storage lives outside + * ApplyReplayContext. + */ + if (entry_spilled || (!queue_append && !entry->from_pq)) + apply_replay_entry_free(entry);Also applies to: 3994-4042
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 2955 - 3044, The replayed entry from apply_replay_queue_next_entry() can be a spilled TopMemoryContext entry (from apply_replay_spill_read_entry()) but entry_spilled is only set for first-pass stream appends, so spilled replay entries and spilled entries when replication_handler() throws are never freed; ensure spilled entries are always freed regardless of origin and on errors by treating spilled status based on where the entry came from and by wrapping replication_handler(msg) in an exception-safe cleanup: set entry_spilled whenever apply_replay_queue_next_entry() returned a spilled entry (or detect that from the entry), call apply_replay_entry_free(entry) after replication_handler(msg) as currently done, and also free the spilled entry in a PG_TRY/PG_CATCH (or equivalent error-path) before rethrowing so that both replay and error paths release TopMemoryContext-backed entries; update any logic in functions apply_replay_queue_append_entry, apply_replay_queue_next_entry, replication_handler, and the local use of entry_spilled to reflect this invariant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/spock_apply.c`:
- Line 137: Change apply_replay_bytes from int to an unsigned wider type
(recommended Size or uint64) and update all related uses: in the increment where
apply_replay_bytes += msg->len (referencing msg->len) cast/ensure both operands
use the wider type, update the comparison in the spill/check logic (the branch
that resets apply_replay_bytes when it exceeds the spill threshold) to compare
using the wider type (with any necessary casts), and change the
diagnostic/format string that prints apply_replay_bytes from "%d" to "%zu" (or
the appropriate specifier for the chosen type). Ensure all assignments,
increments, and prints use the new type consistently to avoid overflow and type
warnings.
In `@tests/regress/sql/spill_transaction.sql`:
- Around line 159-165: The test races because disabling is asynchronous; modify
the sequence around the SELECT from spock.subscription and the call to
spock.sub_enable('test_subscription', true) so the test first polls/waits until
the subscription row reflects the disabled state (i.e., sub_enabled = false /
sub_disabled = true) before deleting the conflicting row and calling
spock.sub_enable; use a simple retry loop with a short sleep that queries
spock.subscription for sub_name = 'test_subscription' and proceeds only once the
row shows disabled to avoid the apply-worker flip racing the re-enable.
---
Outside diff comments:
In `@src/spock_apply.c`:
- Around line 2955-3044: The replayed entry from apply_replay_queue_next_entry()
can be a spilled TopMemoryContext entry (from apply_replay_spill_read_entry())
but entry_spilled is only set for first-pass stream appends, so spilled replay
entries and spilled entries when replication_handler() throws are never freed;
ensure spilled entries are always freed regardless of origin and on errors by
treating spilled status based on where the entry came from and by wrapping
replication_handler(msg) in an exception-safe cleanup: set entry_spilled
whenever apply_replay_queue_next_entry() returned a spilled entry (or detect
that from the entry), call apply_replay_entry_free(entry) after
replication_handler(msg) as currently done, and also free the spilled entry in a
PG_TRY/PG_CATCH (or equivalent error-path) before rethrowing so that both replay
and error paths release TopMemoryContext-backed entries; update any logic in
functions apply_replay_queue_append_entry, apply_replay_queue_next_entry,
replication_handler, and the local use of entry_spilled to reflect this
invariant.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1c14dad2-8228-4bce-b745-5db36660b35f
⛔ Files ignored due to path filters (2)
tests/regress/expected/primary_key.outis excluded by!**/*.outtests/regress/expected/spill_transaction.outis excluded by!**/*.out
📒 Files selected for processing (7)
Makefileinclude/spock.hsrc/spock.csrc/spock_apply.ctests/regress/regress-postgresql.conftests/regress/sql/primary_key.sqltests/regress/sql/spill_transaction.sql
2463ad4 to
17e331e
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/spock_apply.c`:
- Around line 2957-2960: The loop that calls apply_replay_queue_next_entry() and
processes entries fails to free TopMemoryContext-allocated spill entries
returned by apply_replay_spill_read_entry() on both success and error, leaking
memory if replication_handler() throws or when stream-pointer changes; update
the processing loop (around apply_replay_queue_next_entry and where
apply_replay_spill_read_entry is used) to always free those replay entries from
TopMemoryContext after they are consumed — i.e., after replication_handler()
succeeds or in the error/exception path before continue/return — and centralize
the free logic so both the normal success path and any early-exit/error paths
free the entry (refer to functions apply_replay_queue_next_entry(),
apply_replay_spill_read_entry(), replication_handler() and the
ApplyReplayContext handling).
In `@src/spock.c`:
- Around line 1155-1163: The GUC DefineCustomIntVariable call for
spock_replay_queue_size currently sets the minimum to -1 which creates an
undocumented "unlimited" alias; either change the minimum back to 0 to prevent
the undocumented value or update the help string to explicitly document -1 as a
compatibility alias; locate the DefineCustomIntVariable invocation (symbol:
DefineCustomIntVariable and variable spock_replay_queue_size) and implement one
of the two fixes: (A) set the lower bound argument from -1 to 0 and adjust any
tests/comments, or (B) keep -1 but extend the help text to mention that -1 is a
compatibility alias for unlimited memory and ensure any validation/logic that
checks spock_replay_queue_size treats -1 identically to 0.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 29266b9f-2ff7-4e43-83cd-2d63e42ab8c4
⛔ Files ignored due to path filters (2)
tests/regress/expected/primary_key.outis excluded by!**/*.outtests/regress/expected/spill_transaction.outis excluded by!**/*.out
📒 Files selected for processing (7)
Makefileinclude/spock.hsrc/spock.csrc/spock_apply.ctests/regress/regress-postgresql.conftests/regress/sql/primary_key.sqltests/regress/sql/spill_transaction.sql
✅ Files skipped from review due to trivial changes (2)
- include/spock.h
- tests/regress/sql/spill_transaction.sql
🚧 Files skipped from review as they are similar to previous changes (2)
- Makefile
- tests/regress/regress-postgresql.conf
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/spock.c (1)
1155-1163:⚠️ Potential issue | 🟡 MinorDocument
-1explicitly or restore lower bound to0.Lines 1156-1159 document
0as the disable/unlimited mode, but Line 1162 allows-1; with spill activation gated byspock_replay_queue_size > 0(src/spock_apply.c:4152-4170),-1is also effectively “disable spill.” This leaves an undocumented alias.Suggested fix (pick one policy)
- -1, + 0,or keep
-1and update the help text to explicitly state-1is accepted as an alias for unlimited/no spill.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock.c` around lines 1155 - 1163, The help text for DefineCustomIntVariable("spock.exception_replay_queue_size", ...) claims "Set to 0 to disable spilling" but the call allows -1 as a valid value (min = -1) which is also treated as "disable"; either restore the lower bound to 0 by changing the min argument from -1 to 0 so only 0 means unlimited, or keep -1 allowed but update the help string to explicitly mention that -1 is an accepted alias for unlimited/no spill; update the DefineCustomIntVariable invocation and the user-facing help text accordingly and ensure any references/logic in spock_apply.c (spock_replay_queue_size checks) remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/regress/sql/spill_transaction.sql`:
- Around line 178-183: The loop currently checks an unqualified scalar: (SELECT
status FROM spock.sub_show_status()), which can return ambiguous results if
multiple subscriptions exist; change the subquery to explicitly target the test
subscription by either calling a subscription-specific accessor or filtering the
set for 'test_subscription' (e.g. SELECT status FROM spock.sub_show_status()
WHERE name = 'test_subscription' or use
spock.sub_show_status('test_subscription') if that overload exists) so the WHILE
condition only polls the test_subscription row.
---
Duplicate comments:
In `@src/spock.c`:
- Around line 1155-1163: The help text for
DefineCustomIntVariable("spock.exception_replay_queue_size", ...) claims "Set to
0 to disable spilling" but the call allows -1 as a valid value (min = -1) which
is also treated as "disable"; either restore the lower bound to 0 by changing
the min argument from -1 to 0 so only 0 means unlimited, or keep -1 allowed but
update the help string to explicitly mention that -1 is an accepted alias for
unlimited/no spill; update the DefineCustomIntVariable invocation and the
user-facing help text accordingly and ensure any references/logic in
spock_apply.c (spock_replay_queue_size checks) remain consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8692e76e-fad4-46ee-9719-3e8390d796fd
⛔ Files ignored due to path filters (2)
tests/regress/expected/primary_key.outis excluded by!**/*.outtests/regress/expected/spill_transaction.outis excluded by!**/*.out
📒 Files selected for processing (7)
Makefileinclude/spock.hsrc/spock.csrc/spock_apply.ctests/regress/regress-postgresql.conftests/regress/sql/primary_key.sqltests/regress/sql/spill_transaction.sql
✅ Files skipped from review due to trivial changes (3)
- tests/regress/regress-postgresql.conf
- tests/regress/sql/primary_key.sql
- include/spock.h
🚧 Files skipped from review as they are similar to previous changes (2)
- Makefile
- src/spock_apply.c
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
src/spock_apply.c (2)
2999-3046:⚠️ Potential issue | 🔴 CriticalSnapshot the cleanup decision before dispatch, and run it on both success and error.
Line 1079 now resets
ApplyReplayContextinsidehandle_commit(), so the post-call!entry->from_pqcheck can dereference a freed in-memory COMMIT entry. Separately, ifreplication_handler()errors on the current spilled stream entry, cleanup only exists on the success path and the TopMemoryContext copy leaks. Compute a localentry_needs_freebefore calling the handler and use a localPG_TRY/PG_CATCHso both paths free safely.Suggested patch
- bool entry_spilled = false; + bool entry_needs_free = false; @@ else { entry = apply_replay_queue_next_entry(); if (entry == NULL) continue; + entry_needs_free = !entry->from_pq; queue_append = false; } @@ if (queue_append) { ApplyReplayEntry *orig = entry; entry = apply_replay_queue_append_entry(entry, msg); - entry_spilled = (entry != orig); + entry_needs_free = (entry != orig); @@ - if (entry_spilled) + if (entry_needs_free) msg = &entry->copydata; } @@ - replication_handler(msg); + PG_TRY(); + { + replication_handler(msg); + } + PG_CATCH(); + { + if (entry_needs_free) + apply_replay_entry_free(entry); + PG_RE_THROW(); + } + PG_END_TRY(); @@ - if (entry_spilled || !entry->from_pq) + if (entry_needs_free) apply_replay_entry_free(entry);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 2999 - 3046, The cleanup decision for spilled/in-memory entries must be snapshotted before calling replication_handler and performed on both success and error; compute a local bool (e.g. entry_needs_free) from entry_spilled || !entry->from_pq immediately after apply_replay_queue_append_entry (before any call that may reset ApplyReplayContext in handle_commit), then wrap the call to replication_handler(msg) inside a PG_TRY/PG_CATCH and in both the PG_TRY end and the PG_CATCH ensure apply_replay_entry_free(entry) is called when entry_needs_free is true; this avoids dereferencing a freed in-memory COMMIT entry after handle_commit resets ApplyReplayContext and prevents leaking TopMemoryContext copies on errors.
136-137:⚠️ Potential issue | 🟠 MajorWiden
apply_replay_bytesbefore the threshold math overflows.
apply_replay_bytesis still anint, butspock.exception_replay_queue_sizeis allowed up toMAX_KILOBYTES / 1024insrc/spock.c:1155-1167, and0now means “unlimited”. The addition at Line 4156 and the accumulation at Line 4214 can therefore overflow long before a valid configured limit, or just wrap indefinitely in the unlimited case. Please move this accounting toSize/uint64and update the comparison/log formatting consistently.Also applies to: 4155-4156, 4213-4215
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 136 - 137, The apply_replay_bytes counter is an int and can overflow/wrap when compared against spock.exception_replay_queue_size (which can be large or "0" for unlimited); change apply_replay_bytes to use a wider unsigned type (Size or uint64) and update all arithmetic and comparisons that reference apply_replay_bytes (notably in the addition site in the replay-queue enqueue path and the accumulation in the dequeue/cleanup path where apply_replay_bytes is adjusted) to use the new type, ensure the unlimited case (0) is handled correctly, and update any logging/format strings that print apply_replay_bytes to use the appropriate format specifier for the wider type so comparisons and logs are consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/spock_apply.c`:
- Around line 4154-4205: The spill setup/write can fail after the current
message was consumed but before it is reachable from the in-memory queue, which
would cause replay to start from an incomplete queue; wrap the
BufFileCreateTemp()/apply_replay_spill_write_entry() sequence in an
exception-safe block (PG_TRY/PG_CATCH) so that on any error you either (a) abort
and rethrow (so normal replay is bypassed) or (b) restore state so the current
entry remains reachable (do not set apply_replay_spilling, do not free entry,
and do not change apply_replay_spill_count/apply_replay_bytes) until the spill
completes successfully; only after apply_replay_spill_write_entry() returns
successfully set apply_replay_spilling, update
apply_replay_spill_count/apply_replay_bytes as needed, move/alloc the
ApplyReplayEntry into TopMemoryContext and pfree(entry) and then return mc_entry
— ensuring apply_replay_head and the in-memory queue are not left missing the
current entry on spill failure.
In `@tests/regress/sql/spill_transaction.sql`:
- Around line 115-117: The regression uses spock.exception_log and
spock.resolutions rows from earlier scenarios causing flaky checks; before each
conflict block (or immediately before the SELECTs shown that read
spock.exception_log and spock.resolutions) either TRUNCATE those diagnostic
tables or add a scenario-specific filter (e.g. a WHERE clause referencing a
scenario id/timestamp/command_counter range) and stable ORDER BY clauses so the
SELECTs (the queries reading spock.exception_log and the final SELECT from
spock.resolutions) only return rows for the current conflict case and have
deterministic ordering.
---
Duplicate comments:
In `@src/spock_apply.c`:
- Around line 2999-3046: The cleanup decision for spilled/in-memory entries must
be snapshotted before calling replication_handler and performed on both success
and error; compute a local bool (e.g. entry_needs_free) from entry_spilled ||
!entry->from_pq immediately after apply_replay_queue_append_entry (before any
call that may reset ApplyReplayContext in handle_commit), then wrap the call to
replication_handler(msg) inside a PG_TRY/PG_CATCH and in both the PG_TRY end and
the PG_CATCH ensure apply_replay_entry_free(entry) is called when
entry_needs_free is true; this avoids dereferencing a freed in-memory COMMIT
entry after handle_commit resets ApplyReplayContext and prevents leaking
TopMemoryContext copies on errors.
- Around line 136-137: The apply_replay_bytes counter is an int and can
overflow/wrap when compared against spock.exception_replay_queue_size (which can
be large or "0" for unlimited); change apply_replay_bytes to use a wider
unsigned type (Size or uint64) and update all arithmetic and comparisons that
reference apply_replay_bytes (notably in the addition site in the replay-queue
enqueue path and the accumulation in the dequeue/cleanup path where
apply_replay_bytes is adjusted) to use the new type, ensure the unlimited case
(0) is handled correctly, and update any logging/format strings that print
apply_replay_bytes to use the appropriate format specifier for the wider type so
comparisons and logs are consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 17eafc4e-e099-4b10-a4f7-3a4d4dc49f26
⛔ Files ignored due to path filters (1)
tests/regress/expected/spill_transaction.outis excluded by!**/*.out
📒 Files selected for processing (3)
src/spock.csrc/spock_apply.ctests/regress/sql/spill_transaction.sql
🚧 Files skipped from review as they are similar to previous changes (1)
- src/spock.c
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/spock_apply.c (2)
2957-2960:⚠️ Potential issue | 🟠 MajorFree
TopMemoryContextreplay entries on the exception path too.The cleanup only runs after
replication_handler(msg)returns. If that call throws on the current spilled/replayed entry, the struct is already outsideApplyReplayContext, soapply_replay_queue_reset()will not reclaim it and the payload stays resident for the rest of the worker lifetime.Suggested fix
- bool entry_spilled = false; + bool entry_needs_free = false; @@ else { entry = apply_replay_queue_next_entry(); if (entry == NULL) continue; + entry_needs_free = !entry->from_pq; queue_append = false; } @@ ApplyReplayEntry *orig = entry; entry = apply_replay_queue_append_entry(entry, msg); - entry_spilled = (entry != orig); + entry_needs_free = (entry != orig); @@ - if (entry_spilled) + if (entry_needs_free) msg = &entry->copydata; } @@ - replication_handler(msg); + PG_TRY(); + { + replication_handler(msg); + } + PG_CATCH(); + { + if (entry_needs_free) + apply_replay_entry_free(entry); + PG_RE_THROW(); + } + PG_END_TRY(); @@ - if (entry_spilled || !entry->from_pq) + if (entry_needs_free) apply_replay_entry_free(entry);Also applies to: 3002-3046
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 2957 - 2960, The current code takes an entry from apply_replay_queue_next_entry() and processes it with replication_handler(msg) but if replication_handler throws the spilled/replayed entry remains allocated in TopMemoryContext because apply_replay_queue_reset() only runs after normal return; add cleanup in the exception path to free or reset the current replay entry and restore TopMemoryContext so the payload is reclaimed. Concretely, wrap the replication_handler(msg) call with error handling that, on exception, calls the same cleanup that apply_replay_queue_reset() would for the active entry (e.g., free/release the entry returned by apply_replay_queue_next_entry() and reset or switch out of ApplyReplayContext/TopMemoryContext) — apply this change for both places where an entry is pulled and processed (the block using apply_replay_queue_next_entry and the similar block later around replication_handler).
4161-4206:⚠️ Potential issue | 🔴 CriticalDon’t enter replay from a spill transition that never made the current record durable.
By the time Line 4190 runs, the current
wrecord is already consumed from libpq, but it is neither linked intoapply_replay_headnor guaranteed to be replayable until the create/write/relocate sequence completes. AnyERRORin that window sends the outer catch intoapply_replay_queue_start_replay()with an incomplete queue, so the worker resumes at the next stream record and can silently drop the current change. Treat the whole spill-transition sequence as one exception-safe unit: either keep the current entry reachable until it succeeds, or rethrow and force a worker restart instead of normal replay.Also applies to: 3233-3265
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 4161 - 4206, The spill-transition can fail after the libpq record is consumed but before it is safely persisted, so either preserve the current entry until the spill completes or force a restart on error; to fix, allocate/copy the ApplyReplayEntry into TopMemoryContext (create mc_entry via MemoryContextSwitchTo(TopMemoryContext) + memcpy of entry) before calling apply_replay_spill_write_entry / creating apply_replay_spill_file, then call apply_replay_spill_write_entry(msg->len, msg->data) and only pfree(entry) after the write succeeds; alternatively wrap the entire spill sequence (BufFileCreateTemp, apply_replay_spill_write_entry, memcpy/pfree) in a PG_TRY/PG_CATCH and rethrow on error so apply_replay_queue_start_replay cannot run with an incomplete queue — update code paths referencing apply_replay_spilling, apply_replay_spill_file, apply_replay_spill_write_entry, ApplyReplayEntry, apply_replay_head, and entry accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/spock_apply.c`:
- Around line 2957-2960: The current code takes an entry from
apply_replay_queue_next_entry() and processes it with replication_handler(msg)
but if replication_handler throws the spilled/replayed entry remains allocated
in TopMemoryContext because apply_replay_queue_reset() only runs after normal
return; add cleanup in the exception path to free or reset the current replay
entry and restore TopMemoryContext so the payload is reclaimed. Concretely, wrap
the replication_handler(msg) call with error handling that, on exception, calls
the same cleanup that apply_replay_queue_reset() would for the active entry
(e.g., free/release the entry returned by apply_replay_queue_next_entry() and
reset or switch out of ApplyReplayContext/TopMemoryContext) — apply this change
for both places where an entry is pulled and processed (the block using
apply_replay_queue_next_entry and the similar block later around
replication_handler).
- Around line 4161-4206: The spill-transition can fail after the libpq record is
consumed but before it is safely persisted, so either preserve the current entry
until the spill completes or force a restart on error; to fix, allocate/copy the
ApplyReplayEntry into TopMemoryContext (create mc_entry via
MemoryContextSwitchTo(TopMemoryContext) + memcpy of entry) before calling
apply_replay_spill_write_entry / creating apply_replay_spill_file, then call
apply_replay_spill_write_entry(msg->len, msg->data) and only pfree(entry) after
the write succeeds; alternatively wrap the entire spill sequence
(BufFileCreateTemp, apply_replay_spill_write_entry, memcpy/pfree) in a
PG_TRY/PG_CATCH and rethrow on error so apply_replay_queue_start_replay cannot
run with an incomplete queue — update code paths referencing
apply_replay_spilling, apply_replay_spill_file, apply_replay_spill_write_entry,
ApplyReplayEntry, apply_replay_head, and entry accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b3d2c197-63c7-4516-b236-70e1ce7a0e2f
⛔ Files ignored due to path filters (1)
tests/regress/expected/spill_transaction.outis excluded by!**/*.out
📒 Files selected for processing (3)
src/spock.csrc/spock_apply.ctests/regress/sql/spill_transaction.sql
🚧 Files skipped from review as they are similar to previous changes (1)
- src/spock.c
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/spock_apply.c (2)
137-137: Consider wideningapply_replay_bytestoSizefor consistency.While the current
inttype works given the spill threshold check, usingSize(orint64) would align with PostgreSQL's memory accounting conventions and provide defensive headroom. The variable accumulatesmsg->lenvalues until spilling activates, so its maximum is bounded byspock_replay_queue_sizeplus one message.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` at line 137, Change the type of apply_replay_bytes from int to Size to match PostgreSQL memory accounting and avoid overflow when accumulating msg->len values; update its declaration (apply_replay_bytes) to "Size", ensure any comparisons against spock_replay_queue_size and additions of msg->len use the Size type (cast msg->len if necessary), and adjust any related format/print or helper calls that assume an int to accept Size values.
4109-4111: Temporary debug logging is acceptable for initial release.The
XXXcomments indicate awareness that theseDEBUG1statements are for validation during the stabilization period. The conditional logging every 100 actions helps reduce volume while still providing observability into spill behavior.Consider creating a follow-up task to demote these to
DEBUG2or remove them once the spill-to-disk feature is proven stable in production.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 4109 - 4111, The temporary DEBUG1 logging around the spill-to-disk replay (the elog(DEBUG1, "SPOCK: replay queue next entry from memory: len=%d", entry->copydata.len) site in spock_apply.c) should be tracked for future demotion/removal; create a follow-up task (ticket or TODO) that records: verify spill-to-disk stability in production, then either demote this elog to DEBUG2 or remove it, and update the source comment (replace the XXX with the ticket ID and a short note) so reviewers can find and close the task later.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/spock_apply.c`:
- Line 137: Change the type of apply_replay_bytes from int to Size to match
PostgreSQL memory accounting and avoid overflow when accumulating msg->len
values; update its declaration (apply_replay_bytes) to "Size", ensure any
comparisons against spock_replay_queue_size and additions of msg->len use the
Size type (cast msg->len if necessary), and adjust any related format/print or
helper calls that assume an int to accept Size values.
- Around line 4109-4111: The temporary DEBUG1 logging around the spill-to-disk
replay (the elog(DEBUG1, "SPOCK: replay queue next entry from memory: len=%d",
entry->copydata.len) site in spock_apply.c) should be tracked for future
demotion/removal; create a follow-up task (ticket or TODO) that records: verify
spill-to-disk stability in production, then either demote this elog to DEBUG2 or
remove it, and update the source comment (replace the XXX with the ticket ID and
a short note) so reviewers can find and close the task later.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a0d91ac5-2bbc-4e68-b51a-338b4604f66d
⛔ Files ignored due to path filters (1)
tests/regress/expected/spill_transaction.outis excluded by!**/*.out
📒 Files selected for processing (3)
src/spock.csrc/spock_apply.ctests/regress/sql/spill_transaction.sql
🚧 Files skipped from review as they are similar to previous changes (1)
- src/spock.c
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 0 |
TIP This summary will be updated as you push new changes. Give us feedback
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/spock_apply.c (2)
2911-2914:⚠️ Potential issue | 🟠 MajorFree spill-backed entries when
replication_handler()throws.
entry_spilled/ spill-read entries are only released on the success path. Ifreplication_handler()raises before Line 3060, theTopMemoryContextcopy becomes unreachable; on the initial exception path the worker keeps running into replay mode, so each failed large transaction can leak the very buffer this spill path was meant to cap.Patch
- bool entry_spilled = false; + bool entry_needs_free = false; @@ else { entry = apply_replay_queue_next_entry(); if (entry == NULL) continue; + entry_needs_free = !entry->from_pq; queue_append = false; } @@ ApplyReplayEntry *orig = entry; entry = apply_replay_queue_append_entry(entry, msg); - entry_spilled = (entry != orig); + entry_needs_free = (entry != orig); @@ - if (entry_spilled) + if (entry_needs_free) msg = &entry->copydata; } @@ - replication_handler(msg); + PG_TRY(); + { + replication_handler(msg); + } + PG_CATCH(); + { + if (entry_needs_free) + apply_replay_entry_free(entry); + PG_RE_THROW(); + } + PG_END_TRY(); @@ - if (entry_spilled || !entry->from_pq) + if (entry_needs_free) apply_replay_entry_free(entry);Also applies to: 3014-3028, 3046-3061
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 2911 - 2914, The code currently only frees spill-backed ApplyReplayEntry objects (tracked by entry_spilled) on the success path, so if replication_handler() throws the spilled buffer remains leaked; update the control flow around replication_handler() in the replay loop to always release any spill-backed resources on error (use PG_TRY/PG_CATCH or a cleanup block) — ensure that when replication_handler() raises you check entry_spilled and free the spill-read contents and msg, and free or pfree the ApplyReplayEntry copy that was allocated in TopMemoryContext before rethrowing or switching to the replay path; locate the variables ApplyReplayEntry, entry_spilled, replication_handler(), and msg to add the cleanup so both the normal and exception paths free the spill-backed entries.
136-137:⚠️ Potential issue | 🟡 MinorUse a wider type for
apply_replay_bytes.The GUC allows thresholds up to roughly 2GB, so
apply_replay_bytes + msg->lencan wrapintbefore the spill check runs. That can postpone spilling on exactly the large transactions this feature targets.Patch
-static int apply_replay_bytes = 0; +static Size apply_replay_bytes = 0; @@ - if (!apply_replay_spilling && spock_replay_queue_size > 0 && - apply_replay_bytes + msg->len > spock_replay_queue_size * 1024L * 1024L) + if (!apply_replay_spilling && spock_replay_queue_size > 0 && + apply_replay_bytes + (Size) msg->len > + (Size) spock_replay_queue_size * 1024 * 1024) @@ - "in-memory %d bytes exceeds %d MB limit", + "in-memory %zu bytes exceeds %d MB limit", @@ - apply_replay_bytes += msg->len; + apply_replay_bytes += (Size) msg->len;Also applies to: 4176-4195, 4229-4230
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 136 - 137, The variable apply_replay_bytes must be widened to a 64-bit type to avoid integer wrap when adding large msg->len values; change the declaration of apply_replay_bytes from int to a 64-bit integer (e.g. int64_t/long long/pg_int64 consistent with project types), update all other occurrences that assume it is an int (including the other declarations/usages referenced around apply_replay_bytes and the spots that add msg->len or compare against the GUC threshold), ensure initializations and comparisons use the new type (and add casts where msg->len is smaller) so the spill check uses 64-bit arithmetic throughout.
🧹 Nitpick comments (1)
tests/regress/sql/primary_key.sql (1)
134-142: Scope this wait totest_subscription.This loop still polls the unqualified
spock.sub_show_status()form. Since the same script already usestest_subscriptionat Line 171, using the named overload here too keeps the scalar subquery deterministic if another subscription exists.Patch
- WHILE (SELECT status FROM spock.sub_show_status()) <> 'disabled' LOOP + WHILE (SELECT status FROM spock.sub_show_status('test_subscription')) <> 'disabled' LOOP🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/regress/sql/primary_key.sql` around lines 134 - 142, The polling loop uses the unqualified spock.sub_show_status() which can return non-deterministic results if multiple subscriptions exist; change the scalar subquery to call the named overload with the test_subscription identifier (i.e., use spock.sub_show_status(test_subscription)) so the WHILE condition becomes deterministic and consistent with the later usage of test_subscription.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/spock_apply.c`:
- Around line 2911-2914: The code currently only frees spill-backed
ApplyReplayEntry objects (tracked by entry_spilled) on the success path, so if
replication_handler() throws the spilled buffer remains leaked; update the
control flow around replication_handler() in the replay loop to always release
any spill-backed resources on error (use PG_TRY/PG_CATCH or a cleanup block) —
ensure that when replication_handler() raises you check entry_spilled and free
the spill-read contents and msg, and free or pfree the ApplyReplayEntry copy
that was allocated in TopMemoryContext before rethrowing or switching to the
replay path; locate the variables ApplyReplayEntry, entry_spilled,
replication_handler(), and msg to add the cleanup so both the normal and
exception paths free the spill-backed entries.
- Around line 136-137: The variable apply_replay_bytes must be widened to a
64-bit type to avoid integer wrap when adding large msg->len values; change the
declaration of apply_replay_bytes from int to a 64-bit integer (e.g.
int64_t/long long/pg_int64 consistent with project types), update all other
occurrences that assume it is an int (including the other declarations/usages
referenced around apply_replay_bytes and the spots that add msg->len or compare
against the GUC threshold), ensure initializations and comparisons use the new
type (and add casts where msg->len is smaller) so the spill check uses 64-bit
arithmetic throughout.
---
Nitpick comments:
In `@tests/regress/sql/primary_key.sql`:
- Around line 134-142: The polling loop uses the unqualified
spock.sub_show_status() which can return non-deterministic results if multiple
subscriptions exist; change the scalar subquery to call the named overload with
the test_subscription identifier (i.e., use
spock.sub_show_status(test_subscription)) so the WHILE condition becomes
deterministic and consistent with the later usage of test_subscription.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6dd04287-f0de-412b-ac79-5ff618505ec7
⛔ Files ignored due to path filters (2)
tests/regress/expected/primary_key.outis excluded by!**/*.outtests/regress/expected/spill_transaction.outis excluded by!**/*.out
📒 Files selected for processing (7)
Makefileinclude/spock.hsrc/spock.csrc/spock_apply.ctests/regress/regress-postgresql.conftests/regress/sql/primary_key.sqltests/regress/sql/spill_transaction.sql
✅ Files skipped from review due to trivial changes (2)
- tests/regress/regress-postgresql.conf
- include/spock.h
🚧 Files skipped from review as they are similar to previous changes (1)
- Makefile
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/workflows/spockbench.yml (1)
50-54:ulimit -c unlimitedis a no-op in this step.That limit only affects the current shell and its children, and this step does not start any long-lived workload after setting it. It won't carry into the later
docker compose up/docker runsteps, so I'd either drop it or move it into the same step as the process that should inherit it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/spockbench.yml around lines 50 - 54, The "Enable core dumps on host" step sets "ulimit -c unlimited" which only affects that shell and not later steps (so it's a no-op for the containers); either remove the ulimit line or move it into the same step that launches the processes that need core dumps (e.g., the step that runs "docker compose up" / "docker run"), or alternatively use Docker's ulimit support (pass --ulimit core=-1 to docker run or the ulimits section in docker-compose) so the started containers actually inherit an unlimited core size; update the workflow accordingly referencing the step named "Enable core dumps on host" and the "ulimit -c unlimited" command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/spockbench.yml:
- Around line 169-177: docker exec on stopped containers (using "$REG_CT_NAME"
and "$TAP_CT_NAME") fails silently and prevents extraction of core backtraces;
instead, run a new temporary container that mounts the host core directories and
workspace and runs the gdb extraction. Replace the docker exec blocks that
reference /tmp/regress-cores and /tmp/tap-cores with docker run --rm mounting
/tmp/regress-cores:/cores (or /tmp/tap-cores for the TAP step) and
"${GITHUB_WORKSPACE}":/workspace, use the same image used elsewhere for Spock
(the SPOCK image variable used in this workflow), and run the existing bash -c
block there to produce backtraces.txt in the workspace; keep the gdb commands
and output redirection (to
${GITHUB_WORKSPACE}/tests/regress/regression_output/backtraces.txt and the TAP
equivalent) but remove the || true that hides failures.
---
Nitpick comments:
In @.github/workflows/spockbench.yml:
- Around line 50-54: The "Enable core dumps on host" step sets "ulimit -c
unlimited" which only affects that shell and not later steps (so it's a no-op
for the containers); either remove the ulimit line or move it into the same step
that launches the processes that need core dumps (e.g., the step that runs
"docker compose up" / "docker run"), or alternatively use Docker's ulimit
support (pass --ulimit core=-1 to docker run or the ulimits section in
docker-compose) so the started containers actually inherit an unlimited core
size; update the workflow accordingly referencing the step named "Enable core
dumps on host" and the "ulimit -c unlimited" command.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 65001453-9c04-44ac-a07e-80134244f8bf
📒 Files selected for processing (2)
.github/workflows/spockbench.ymltests/regress/sql/spill_transaction.sql
9df3149 to
67eab35
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/spock_apply.c (2)
4257-4262: Missing error check for BufFileSeek.
BufFileSeek()returns 0 on success and non-zero on failure. If the seek fails (e.g., file corruption), subsequent reads will return incorrect data. Consider checking the return value:🛡️ Suggested fix
if (apply_replay_spilling) { + int seek_result; + Assert(apply_replay_spill_file != NULL); - BufFileSeek(apply_replay_spill_file, 0, 0, SEEK_SET); + seek_result = BufFileSeek(apply_replay_spill_file, 0, 0, SEEK_SET); + if (seek_result != 0) + elog(ERROR, "SPOCK %s: failed to seek replay spill file to start", + MySubscription->name); apply_replay_spill_read = 0; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 4257 - 4262, The BufFileSeek call inside the apply_replay_spilling block (when apply_replay_spilling is true and apply_replay_spill_file is asserted) currently ignores the return value; update the code around BufFileSeek(apply_replay_spill_file, 0, 0, SEEK_SET) to capture its return, check for non-zero (failure), and handle the error (e.g., log the failure via ereport/elog or processLogger equivalent and abort/return an error path) before proceeding to set apply_replay_spill_read = 0; ensure you reference apply_replay_spill_file, BufFileSeek, and apply_replay_spill_read when implementing the check so failed seeks cannot lead to invalid reads.
3044-3047: Entry freeing on success path looks correct, but error path may leak.The
need_freelogic correctly frees spill-read entries afterreplication_handler()succeeds. However, ifreplication_handler()throws an exception, the current entry won't be freed before the error propagates to the outerPG_CATCH. For spilled entries inTopMemoryContext, this leaks memory until worker restart.Consider wrapping the call in a local
PG_TRY/PG_CATCHto free the entry on error:🛡️ Suggested fix
- replication_handler(msg); + PG_TRY(); + { + replication_handler(msg); + } + PG_CATCH(); + { + if (need_free) + apply_replay_entry_free(entry); + PG_RE_THROW(); + } + PG_END_TRY();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 3044 - 3047, The call to replication_handler(entry) can throw and leak spilled entries when need_free is true because apply_replay_entry_free(entry) is only called on the success path; wrap the replication_handler(...) invocation in a local PG_TRY / PG_CATCH block so that in the PG_CATCH path you call apply_replay_entry_free(entry) when need_free is true (and then rethrow with PG_RE_THROW), ensuring spilled TopMemoryContext entries are freed even on exceptions; reference replication_handler, need_free, apply_replay_entry_free, entry, PG_TRY/PG_CATCH and PG_RE_THROW when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/spock_apply.c`:
- Around line 4257-4262: The BufFileSeek call inside the apply_replay_spilling
block (when apply_replay_spilling is true and apply_replay_spill_file is
asserted) currently ignores the return value; update the code around
BufFileSeek(apply_replay_spill_file, 0, 0, SEEK_SET) to capture its return,
check for non-zero (failure), and handle the error (e.g., log the failure via
ereport/elog or processLogger equivalent and abort/return an error path) before
proceeding to set apply_replay_spill_read = 0; ensure you reference
apply_replay_spill_file, BufFileSeek, and apply_replay_spill_read when
implementing the check so failed seeks cannot lead to invalid reads.
- Around line 3044-3047: The call to replication_handler(entry) can throw and
leak spilled entries when need_free is true because
apply_replay_entry_free(entry) is only called on the success path; wrap the
replication_handler(...) invocation in a local PG_TRY / PG_CATCH block so that
in the PG_CATCH path you call apply_replay_entry_free(entry) when need_free is
true (and then rethrow with PG_RE_THROW), ensuring spilled TopMemoryContext
entries are freed even on exceptions; reference replication_handler, need_free,
apply_replay_entry_free, entry, PG_TRY/PG_CATCH and PG_RE_THROW when making the
change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dc8bbaff-e757-4355-a9c5-88f4b0600d93
⛔ Files ignored due to path filters (1)
tests/regress/expected/spill_transaction.outis excluded by!**/*.out
📒 Files selected for processing (2)
src/spock_apply.ctests/regress/sql/spill_transaction.sql
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/regress/sql/spill_transaction.sql
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/spock_apply.c`:
- Around line 4256-4263: The seek of the spill file using
BufFileSeek(apply_replay_spill_file, 0, 0, SEEK_SET) is not checking
BufFileSeek's return value (non-zero indicates error) so a failed seek can cause
incorrect reads and silent corruption; update the block that touches
apply_replay_spill_file to check BufFileSeek's return, and on non-zero handle
the error (e.g., ereport(ERROR, ...) or appropriate cleanup/exit) before
proceeding and only set apply_replay_spill_read = 0 when the seek succeeded to
avoid reading from an unknown position.
In `@tests/regress/sql/spill_transaction.sql`:
- Around line 155-171: The sync_event() call is executed on the subscriber
connection but must be called on the provider to capture the provider WAL LSN
after TRUNCATE; update the script so that after TRUNCATE test_spill and before
switching to the subscriber connection you call SELECT spock.sync_event() AS
sync_lsn (on the provider), set the :sync_lsn variable, then switch to the
subscriber and call CALL spock.wait_for_sync_event(NULL, 'test_provider',
:'sync_lsn', 30); also remove the redundant "\c :subscriber_dsn" that
re-connects to the subscriber when already on it; ensure the sequence around
TRUNCATE, SELECT spock.sync_event(), connection switch, and CALL
spock.wait_for_sync_event() follows that order.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dc0f1d6c-1ecf-4594-baff-a5c6d25a8567
⛔ Files ignored due to path filters (1)
tests/regress/expected/spill_transaction.outis excluded by!**/*.out
📒 Files selected for processing (2)
src/spock_apply.ctests/regress/sql/spill_transaction.sql
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/spock_apply.c (2)
4168-4169:⚠️ Potential issue | 🟡 MinorInteger overflow still possible in apply_replay_bytes check.
The comparison
apply_replay_bytes + msg->len > spock_replay_queue_size * 1024L * 1024Lperforms the addition oninttypes before comparing tolong. Ifapply_replay_bytesis nearINT_MAXandmsg->lenis non-trivial, the addition could overflow to negative, making the check always false and never triggering spilling.While individual messages are bounded by
MaxAllocSize, repeated accumulation without spilling could theoretically reach this limit on systems with largespock_replay_queue_size(max isMAX_KILOBYTES/1024MB per the GUC, which is ~2TB).🛡️ Proposed fix
- if (!apply_replay_spilling && spock_replay_queue_size > 0 && - apply_replay_bytes + msg->len > spock_replay_queue_size * 1024L * 1024L) + if (!apply_replay_spilling && spock_replay_queue_size > 0 && + (Size)apply_replay_bytes + (Size)msg->len > (Size)spock_replay_queue_size * 1024L * 1024L)Or change
apply_replay_bytestoSizeas suggested in the past review.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 4168 - 4169, The check using apply_replay_bytes + msg->len can overflow because the addition happens in an int-sized type; change the logic to perform the comparison in a sufficiently wide unsigned/Size type (or change apply_replay_bytes to Size) so the addition cannot wrap. Concretely, update the condition that references apply_replay_spilling, apply_replay_bytes, msg->len and spock_replay_queue_size to cast operands to Size (or use a Size-typed apply_replay_bytes) and compute target_bytes = (Size) spock_replay_queue_size * 1024 * 1024, then compare (apply_replay_bytes + (Size) msg->len) > target_bytes to avoid integer overflow and ensure correct spilling behavior.
3044-3047:⚠️ Potential issue | 🟡 MinorEntry leak on exception from
replication_handler().If
replication_handler(msg)throws an exception,apply_replay_entry_free(entry)is never called for entries whereneed_freeis true. These areTopMemoryContext-allocated entries that survive the outerPG_CATCHcleanup. Over multiple retried exceptions in a large transaction, this could accumulate significant memory.Consider wrapping the call in a PG_TRY block:
🛡️ Proposed fix
replication_handler(msg); - if (need_free) - apply_replay_entry_free(entry); + PG_TRY(); + { + /* Entry already processed by replication_handler */ + } + PG_FINALLY(); + { + if (need_free) + apply_replay_entry_free(entry); + } + PG_END_TRY();Or simpler, capture entry before the call and free in the outer PG_CATCH.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 3044 - 3047, replication_handler(msg) can throw and skip freeing TopMemoryContext-allocated entries, leaking memory; wrap the call so entry is always freed when need_free is true by either enclosing replication_handler(msg) in a PG_TRY/PG_CATCH and calling apply_replay_entry_free(entry) in the CATCH, or capture the entry before the call and ensure apply_replay_entry_free(entry) is invoked in the outer PG_CATCH; reference replication_handler, apply_replay_entry_free, need_free and entry so the fix locates the right spot and guarantees cleanup even on exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/spock_apply.c`:
- Around line 4176-4179: The creation of the temp spill file via
BufFileCreateTemp() can throw an ERROR and leave the current replay entry
consumed but not enqueued or spilled, so if that happens the outer PG_CATCH will
call apply_replay_queue_start_replay() and replay from an incomplete queue;
update the code around BufFileCreateTemp()/apply_replay_spill_file and the
apply_replay_spilling flag to document this limitation with a brief comment
explaining the failure mode (BufFileCreateTemp may ERROR on disk/perm issues),
the resulting state (entry consumed but not persisted), and the expected
behavior (worker restart/outer PG_CATCH will handle it), referencing
BufFileCreateTemp, apply_replay_spill_file, apply_replay_spilling, PG_CATCH, and
apply_replay_queue_start_replay in the comment.
---
Duplicate comments:
In `@src/spock_apply.c`:
- Around line 4168-4169: The check using apply_replay_bytes + msg->len can
overflow because the addition happens in an int-sized type; change the logic to
perform the comparison in a sufficiently wide unsigned/Size type (or change
apply_replay_bytes to Size) so the addition cannot wrap. Concretely, update the
condition that references apply_replay_spilling, apply_replay_bytes, msg->len
and spock_replay_queue_size to cast operands to Size (or use a Size-typed
apply_replay_bytes) and compute target_bytes = (Size) spock_replay_queue_size *
1024 * 1024, then compare (apply_replay_bytes + (Size) msg->len) > target_bytes
to avoid integer overflow and ensure correct spilling behavior.
- Around line 3044-3047: replication_handler(msg) can throw and skip freeing
TopMemoryContext-allocated entries, leaking memory; wrap the call so entry is
always freed when need_free is true by either enclosing replication_handler(msg)
in a PG_TRY/PG_CATCH and calling apply_replay_entry_free(entry) in the CATCH, or
capture the entry before the call and ensure apply_replay_entry_free(entry) is
invoked in the outer PG_CATCH; reference replication_handler,
apply_replay_entry_free, need_free and entry so the fix locates the right spot
and guarantees cleanup even on exceptions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3189cb4b-047d-46a7-8778-90535c8da725
⛔ Files ignored due to path filters (1)
tests/regress/expected/spill_transaction.outis excluded by!**/*.out
📒 Files selected for processing (2)
src/spock_apply.ctests/regress/sql/spill_transaction.sql
✅ Files skipped from review due to trivial changes (1)
- tests/regress/sql/spill_transaction.sql
When the replay queue exceeds spock.exception_replay_queue_size (in MB), subsequent entries are written to a temporary BufFile on disk instead of accumulating in memory. This prevents OOM on the subscriber when replaying large transactions with exception handling enabled. Key design decisions: - In-memory queue entries (ApplyReplayContext) hold libpq-allocated data buffers. apply_replay_queue_reset walks the list to PQfreemem them before calling MemoryContextReset. - Spilled entries are relocated to TopMemoryContext so they survive the MemoryContextReset in handle_commit. The caller frees them explicitly after replication_handler returns. - When the replay queue is exhausted mid-transaction (exception before COMMIT was received), the loop seamlessly transitions to stream reading via continue instead of break, matching the pre-spill behavior. - Added from_pq field to ApplyReplayEntry to distinguish libpq-allocated buffers (PQfreemem) from palloc'd spill-read buffers (pfree). Includes regression tests covering all three exception_behaviour modes (transdiscard, discard, sub_disable) both without conflict and with a primary key conflict on the last record.
- Fix memory leak: spill-read entries during replay were allocated in TopMemoryContext but never freed because entry_spilled was only set on the first-pass path. Add !entry->from_pq to the free condition so replay-path spill-read entries are freed after processing. - Set GUC minimum for spock.exception_replay_queue_size from -1 to 0, removing an undocumented value that silently behaved the same as 0. - Use explicit subscription name in spill_transaction test when polling sub_show_status to avoid ambiguity with multiple subscriptions.
Refactor apply_replay_queue_append_entry() to return a bool indicating whether the caller must free the entry, instead of requiring the caller to compare pointers and track an entry_spilled flag. The function now updates entry and msg pointers through out-parameters. This fixes a use-after-free: on a COMMIT message, replication_handler() calls handle_commit → apply_replay_queue_reset → MemoryContextReset, which destroys in-memory entries. The old code then accessed entry->from_pq on the freed memory. The new design captures the free decision before replication_handler runs. Also update spill_transaction regression test: move TRUNCATE before subscriber configuration in SUB_DISABLE section to avoid replicating TRUNCATE while exception_behaviour is active, drop command_counter from exception_log queries for stable output, and add sanity checks.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/regress/sql/spill_transaction.sql (1)
13-19: Start this script from a synced replication position.This file begins by truncating local diagnostic tables and flipping subscriber-wide GUCs, but it never first drains any tail apply work from the previous regression. A quick
sync_event()/wait_for_sync_event()barrier here would make the first scenario deterministic and prevent late inserts from repopulatingspock.exception_logafter the truncate.Suggested barrier
+ \c :provider_dsn + SELECT spock.sync_event() as sync_lsn + \gset + \c :subscriber_dsn + CALL spock.wait_for_sync_event(NULL, 'test_provider', :'sync_lsn', 30); TRUNCATE spock.resolutions; TRUNCATE spock.exception_log; ALTER SYSTEM SET spock.save_resolutions = on;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/regress/sql/spill_transaction.sql` around lines 13 - 19, Before truncating local diagnostic tables, add a replication drain barrier by calling sync_event() and then wait_for_sync_event() to ensure the subscriber has applied all prior tail work; place this barrier immediately before the TRUNCATE spock.resolutions; TRUNCATE spock.exception_log; and GUC changes so late apply inserts cannot repopulate spock.exception_log after the truncate. Use the existing sync_event()/wait_for_sync_event() helpers to create a deterministic start-of-test state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/regress/sql/spill_transaction.sql`:
- Around line 24-26: The test is triggering a collision on the unique index
payload_idx instead of the primary-key; remove the secondary unique index
(payload_idx) from the CREATE TABLE block, pre-insert a local row with id =
50000 and a non-conflicting payload value (e.g. payload=1) into table test_spill
before running the provider rows so the conflict occurs on the PK, and then
adjust the DISCARD-mode comment and the aggregate/result expectations to reflect
that the last provider row is the one skipped and the preinserted local PK row
is preserved; reference the table name test_spill, the index payload_idx, and
the id value 50000 when making these edits.
---
Nitpick comments:
In `@tests/regress/sql/spill_transaction.sql`:
- Around line 13-19: Before truncating local diagnostic tables, add a
replication drain barrier by calling sync_event() and then wait_for_sync_event()
to ensure the subscriber has applied all prior tail work; place this barrier
immediately before the TRUNCATE spock.resolutions; TRUNCATE spock.exception_log;
and GUC changes so late apply inserts cannot repopulate spock.exception_log
after the truncate. Use the existing sync_event()/wait_for_sync_event() helpers
to create a deterministic start-of-test state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 08bbf23a-27a8-4c5c-a00e-0ac4b84c60bd
⛔ Files ignored due to path filters (2)
tests/regress/expected/primary_key.outis excluded by!**/*.outtests/regress/expected/spill_transaction.outis excluded by!**/*.out
📒 Files selected for processing (7)
Makefileinclude/spock.hsrc/spock.csrc/spock_apply.ctests/regress/regress-postgresql.conftests/regress/sql/primary_key.sqltests/regress/sql/spill_transaction.sql
✅ Files skipped from review due to trivial changes (3)
- include/spock.h
- tests/regress/regress-postgresql.conf
- tests/regress/sql/primary_key.sql
🚧 Files skipped from review as they are similar to previous changes (3)
- Makefile
- src/spock.c
- src/spock_apply.c
|
Replacing my previous message with an updated patch. EDIT: updated description, it was not accurate. |
When the apply worker's replay queue exceeds
spock.exception_replay_queue_size(in MB), subsequent entries are written to a temporary BufFile on disk instead of accumulating in memory. This prevents OOM on the subscriber when replaying large transactions with exception handling enabled.Design
ApplyReplayContextand hold libpq-allocated data buffers.apply_replay_queue_reset(called fromhandle_commit) walks the list toPQfreememthem before callingMemoryContextReset.ApplyReplayContexttoTopMemoryContextso they survive theMemoryContextResetinhandle_commit. The caller frees them explicitly afterreplication_handlerreturns.from_pqfield toApplyReplayEntryto distinguish libpq-allocated buffers (PQfreemem) from palloc'd spill-read buffers (pfree).continueinstead ofbreak, seamlessly transitioning to stream reading — matching the pre-spill behavior and ensuringhandle_commitruns with a clean transaction state.Changes
apply_replay_queue_append_entry: appends to in-memory list or spills to disk; returns a possibly-relocated entry pointer so the caller can detect spilling.apply_replay_queue_next_entry: drains in-memory list first, then reads from spill file; setsapply_replay_mode = falsewhen exhausted.apply_replay_spill_write_entry/apply_replay_spill_read_entry: binary format[int32 len][data]via BufFile.apply_replay_queue_start_replay: seeks spill file back to start for re-read after exception.apply_replay_queue_reset: unified cleanup — walks list forPQfreemem, resets all state, closes spill file, resets context.exception_behaviourmodes (transdiscard, discard, sub_disable), both without and with a primary key conflict on the last record of a 50k-row COPY.