Add TAP tests for pg_resize_shared_buffers() with concurrent checkpoints#10
Draft
palak-chaturvedi wants to merge 5 commits into
Draft
Conversation
Adds two TAP tests that exercise pg_resize_shared_buffers() against the
checkpointer, using the ProcSignalBarrier injection points already
present in buf_resize.c and three new injection points added to
CreateCheckPoint():
006_resize_concurrent_checkpoint_crash.pl
- Walks the resize state machine through every consecutive pair of
injection points in buf_resize.c (4 shrink pairs + 4 expand pairs).
- At each pair, issues a synchronous CHECKPOINT, crashes the server
with 'immediate' stop, restarts, and verifies that crash recovery
restores the intended shared_buffers value with no checksum
validation errors and no PANIC.
007_resize_async_checkpoint_crash.pl
- Runs pg_resize_shared_buffers() concurrently with a CHECKPOINT
that is parked at successive stages of CreateCheckPoint(). Eight
scripted interleavings (S1-S4 shrink, E1-E4 expand) drive both
the resize and the checkpointer forward through their injection
points, then crash the server and verify crash recovery.
Supporting changes:
* src/backend/access/transam/xlog.c: adds three INJECTION_POINT_LOAD
calls outside the critical section for the new checkpoint stages
(checkpoint-before-redo, checkpoint-before-redo-wal,
checkpoint-after-redo-wal), plus the matching INJECTION_POINT_CACHED
calls inside CreateCheckPoint(). The 'wait' action needs shmem
allocation, which is why the LOADs are outside the critical section.
* src/test/modules/injection_points: adds
injection_points_has_waiter(text) RETURNS boolean, a non-destructive
lookup useful as a poll target in TAP tests. Unlike a wait_for_event
style poll against pg_stat_activity, this function only takes the
injection_points shmem spinlock, so it cannot deadlock against a
backend holding ProcArrayLock.
* src/test/buffermgr/meson.build: registers the two new tests.
* src/test/buffermgr/t/ResizeBuffer/Utils.pm: shared helpers used by
both 006 and 007 (wait_injection_point via
injection_points_has_waiter, detach-before-wake, %point_backend
mapping, wakeup_all_known_points for END-block cleanup).
All 6 buffermgr TAP tests pass together (599 subtests) with
enable_injection_points=yes and EXTRA_INSTALL='contrib/pg_buffercache
src/test/modules/injection_points'.
An earlier iteration of the pg_resize_shared_buffers() TAP tests added a custom C helper, injection_points_has_waiter(), so tests could poll for a backend parked at a named injection point. This has been replaced with upstream's $node->wait_for_event() (which polls pg_stat_activity.wait_event), the same helper used by src/test/buffermgr/t/004_client_join_buffer_resize.pl and other upstream tests. Drop the C helper and its SQL binding.
Add t/002_resize_smoke.pl, a minimal sanity test that exercises pg_resize_shared_buffers() with no injection points, no concurrent workload, and no crash injection. Failures here indicate the resize feature is fundamentally broken; failures in the checkpoint-race tests (006, 007) should not be trusted until this baseline is green. Covers shrink round-trip (4MB -> 1MB -> 4MB), extreme range (256kB -> 32MB), and persistence across a clean restart.
The 006 and 007 crash-recovery tests run bt_index_check() on pgbench_accounts_pkey after recovery to verify that mid-resize crashes have not corrupted heap or index pages. Without amcheck in EXTRA_INSTALL, 'make check' fails to load the extension. meson already installs amcheck as part of the standard build.
Follow-up review pass on 006 and 007 and their shared Utils module. No behavioral change apart from stronger post-recovery checks: * 006: promote pgbench sums invariant from a note to an is() assertion; drop dead 'page verification failed' unlike (this tree does not enable data_checksums, so the regex can never match); drop count(*) as it is subsumed by bt_index_check(). * 007: replace the copy-pasted per-scenario blocks with a data table driven by named injection-point constants; add bt_index_check() and the sums invariant so post-recovery integrity coverage matches 006; rewrite the 'ordering is approximate' comment now that the invariants are understood. * Utils.pm: move %point_backend above attach_injection_point() so both attach and wait validate point names against it (typos die at attach time, not later at wait); drop the pgbench_extended and cointoss dead branches; trim historical/tutorial comments down to actionable ones. Test count is stable at 33+33=66 after the cleanup, running in ~20s combined.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds two TAP tests that exercise pg_resize_shared_buffers() against the checkpointer, using the ProcSignalBarrier injection points already present in buf_resize.c and three new injection points added to CreateCheckPoint():
006_resize_concurrent_checkpoint_crash.pl
- Walks the resize state machine through every consecutive pair of injection points in buf_resize.c (4 shrink pairs + 4 expand pairs).
- At each pair, issues a synchronous CHECKPOINT, crashes the server with 'immediate' stop, restarts, and verifies that crash recovery restores the intended shared_buffers value with no checksum validation errors and no PANIC.
007_resize_async_checkpoint_crash.pl
- Runs pg_resize_shared_buffers() concurrently with a CHECKPOINT that is parked at successive stages of CreateCheckPoint(). Eight scripted interleavings (S1-S4 shrink, E1-E4 expand) drive both the resize and the checkpointer forward through their injection points, then crash the server and verify crash recovery.
Supporting changes:
src/backend/access/transam/xlog.c: adds three INJECTION_POINT_LOAD calls outside the critical section for the new checkpoint stages (checkpoint-before-redo, checkpoint-before-redo-wal, checkpoint-after-redo-wal), plus the matching INJECTION_POINT_CACHED calls inside CreateCheckPoint(). The 'wait' action needs shmem allocation, which is why the LOADs are outside the critical section.
src/test/modules/injection_points: adds injection_points_has_waiter(text) RETURNS boolean, a non-destructive lookup useful as a poll target in TAP tests. Unlike a wait_for_event style poll against pg_stat_activity, this function only takes the injection_points shmem spinlock, so it cannot deadlock against a backend holding ProcArrayLock.
src/test/buffermgr/meson.build: registers the two new tests.
src/test/buffermgr/t/ResizeBuffer/Utils.pm: shared helpers used by both 006 and 007 (wait_injection_point via injection_points_has_waiter, detach-before-wake, %point_backend mapping, wakeup_all_known_points for END-block cleanup).
All 6 buffermgr TAP tests pass together (599 subtests) with enable_injection_points=yes and EXTRA_INSTALL='contrib/pg_buffercache src/test/modules/injection_points'.