fix(reindex): eliminate TOCTOU race in ReindexThread BulkProcessor rebuild#35323
fix(reindex): eliminate TOCTOU race in ReindexThread BulkProcessor rebuild#35323fabrizzio-dotCMS merged 4 commits intomainfrom
Conversation
…build Two concurrent hazards existed in runReindexLoop(): 1. putAll/afterBulk race: putAll(batch N+1) could interleave with afterBulk(batch N) reading workingRecords on the BulkProcessor callback thread, causing deleteReindexEntry/markAsFailed to operate on the wrong entries. 2. Null-check/rebuild not atomic: rebuildBulkIndexer flag set between check and create could leave callbacks pointing to a stale listener. Fix: create a fresh BulkProcessorListener and IndexBulkProcessor per batch via try-with-resources. close() blocks until afterBulk completes before the next batch starts — each batch owns its immutable workingRecords snapshot, so no shared mutable state exists between consecutive batches. - rebuildBulkIndexer flag and method removed (per-batch makes explicit rebuild unnecessary) - closeBulkProcessor() helper removed (no longer needed) - finalizeReIndex() simplified: no longer manages a processor param - BulkProcessorListener.afterBulk: logs warning on >50% failure rate instead of signalling a rebuild Refs #35313 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Claude finished @fabrizzio-dotCMS's task in 3m 6s —— View job PR Review: fix(reindex): eliminate TOCTOU race in ReindexThread BulkProcessor rebuild
SummaryThe core fix is sound: per-batch Issues1.
For the ES path ( The per-batch isolation still protects against the original TOCTOU race regardless (each batch has its own The comment should be corrected: the safety comes from per-batch map isolation, not from 2. Test doesn't verify
Add 3. Static
The existing 4.
} // close() blocks until afterBulk completes before the next batch starts
contentletsIndexed += batchListener.getContentletsIndexed();
5.
final Map<String, ReindexEntry> workingRecords;This is a What's clean
|
…indexThread Adds testPerBatchListenerIsolation to ReindexThreadUnitTest. Runs runReindexLoop() via reflection against mocked dependencies and uses a CountDownLatch to let exactly two batches complete before stopping the thread. Verifies: - createBulkProcessor is called with a fresh listener per batch (not the same instance reused), proving the TOCTOU hazard is eliminated. - Each listener's workingRecords snapshot contains only the entries from its own batch, not contaminated by the adjacent batch. Refs #35313 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Expand wildcard import com.dotmarketing.business.* to explicit imports - Add @IndexLibraryIndependent annotation (class uses only neutral APIs) - Make state field final - Remove unused TimeUnit import (no longer needed after per-batch fix) - Reorder imports per Google Java style - Fix brace formatting in finally block Refs #35313 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Fixes the TOCTOU (Time-Of-Check To Time-Of-Use) race condition in
ReindexThread.runReindexLoop()identified in issue #35313.Root cause
Two concurrent hazards on lines 243–248 of
ReindexThread.java:1. Null-check / rebuild not atomic (
rebuildBulkIndexer)An external thread calling
rebuildBulkIndexer()between the check and the creation can leave callbacks pointing to a stale listener, silently losing reindex records.2.
putAllvsafterBulkcallback raceIn back-to-back loop iterations,
putAll(batch N+1)can interleave withafterBulk(batch N)readingworkingRecords, causingdeleteReindexEntry/markAsFailedto resolve entries from the wrong batch.Fix
BulkProcessorListenerandIndexBulkProcessorper batch via try-with-resourcesclose()blocks untilafterBulkcompletes before the next batch starts — each batch owns its immutableworkingRecordssnapshotrebuildBulkIndexerflag,rebuildBulkIndexer()method, andcloseBulkProcessor()helper removed (all made unnecessary by per-batch approach)finalizeReIndex()simplified: no longer manages a processor parameterBulkProcessorListener.afterBulk: logs warning on >50% failure rate instead of signalling a rebuildScope
Only 2 files changed (
ReindexThread.java+BulkProcessorListener.java), net -33 lines.Test plan
ReindexThreadintegration tests passOSCreateContentIndexIntegrationTest/OpenSearchUpgradeSuitepassrebuildBulkIndexer()while reindex is in flight — noConcurrentModificationException, no silent record lossfinalizeReIndex→ switchover → notificationCloses #35313