fix: reduce RSS growth from dedupLoad sync.Map retention and flush convoy (cherry-pick to 4.0-dev) by jiangxinmeng1 · Pull Request #24388 · matrixorigin/matrixone

jiangxinmeng1 · 2026-05-14T02:44:41Z

What type of PR is this?

Which issue(s) does this PR fix or relate to?

What this PR does / why we need it:

Cherry-pick of #24387 to 4.0-dev.

Replace sync.Map with sync.Mutex + plain map in dedupLoad so entries are fully reclaimed after deletion (sync.Map internal hash table never shrinks)
Each waiter now copies the loaded []byte result instead of sharing a reference, allowing the loadCall struct to be GC'd promptly without waiting for the slowest consumer
Change flushSemaphore from hardcoded capacity 4 to dynamic GOMAXPROCS/2 (clamped [4, 16]) to improve flush throughput on multi-core machines
Add 200ms timeout fallback to flushSemaphore acquisition so queued workers don't hold S3 write buffers indefinitely, avoiding convoy-induced RSS growth

What was the problem

PR #24309 introduced dedupLoad (cache stampede prevention) and flushSemaphore (thundering-herd protection). While these correctly prevent OOM in the sysbench insert-ignore scenario, under TPCC 10w/100t (100 terminals concurrent writes) they cause baseline RSS to grow from 31-36 GiB to 40-46 GiB:

sync.Map never shrinks: Even after Delete, the internal hash table retains its peak size, causing monotonic memory growth under sustained load
Shared []byte reference: All waiters hold a reference to loadCall.val, preventing GC until the slowest goroutine finishes
Flush convoy: With only 4 flush slots and 100 workers, 96 workers queue while holding their S3 write buffers

Test plan

TPCC 10w/100t: verify baseline RSS stays at 31-36 GiB
Sysbench 1000w insert-ignore: verify no OOM regression

Ref #24348

🤖 Generated with Claude Code

@include-all

…nvoy The dedupLoad mechanism (introduced in matrixorigin#24309) used sync.Map which never shrinks its internal hash table after Delete. Under TPCC 100-terminal concurrency this caused monotonic memory growth. Additionally, all waiters shared a single []byte reference, preventing GC until the slowest consumer finished. Replace sync.Map with mutex+map so entries are fully reclaimed on delete. Each waiter now copies the result so the shared loadCall can be GC'd promptly. The flushSemaphore (also from matrixorigin#24309) was hardcoded to 4 concurrent flushes. With 100 workers hitting memory pressure, 96 would queue holding their S3 write buffers, causing RSS to spike ~8-10 GiB. Change to dynamic capacity (GOMAXPROCS/2, clamped [4,16]) and add 200ms timeout fallback so queued workers don't hold buffers indefinitely. Approved by: @include-all Ref matrixorigin#24348 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

qodo-code-review · 2026-05-14T02:44:44Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Add explicit memory-cache capacity backpressure before DiskCache.Read converts disk bytes into CachedData. Keep DiskCache allocation on the default allocator so the gate is applied at the disk-cache layer instead of relying on S3FS/LocalFS allocator wrappers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@XuPeng-SH

…tions (matrixorigin#24352) Two issues causing unbounded memory growth under high-concurrency DiskCache read paths (1000 TPCC FOR UPDATE terminals simultaneously hitting `AllocateCacheDataWithHint`): - **`EnsureNBytes` silently skips eviction under contention**: The FIFO cache's `Evict` uses `TryLock` — when 1000 goroutines concurrently call `EnsureNBytes`, only one actually evicts while the rest skip and proceed to allocate, causing the memory cache to overshoot its configured capacity by up to `N × blockSize`. Fix: when the cache is already at/over capacity, use blocking `EvictWithWait` to guarantee space is freed before returning. - **IO buffer and CachedData held simultaneously**: `ReadFromOSFile` held both the compressed IO buffer (`i.Data`, allocated via `ioAllocator`) and the decompressed `CachedData` (allocated via `memoryCacheAllocator`) until `IOVector.Release()`. Since this function is only called from `DiskCache.Read` (which sets `fromCache = diskCache`, causing `diskCache.Update` to skip the entry), the IO buffer can be released immediately after `setCachedData` succeeds. This reduces peak per-entry memory by ~30-50%. The OOM path: `LockOp.callNonBlocking` → `TableScan` → `DiskCache.Read` → `setCachedData` → `constructorFactory` → `AllocateCacheDataWithHint` → `MetricsAllocator.Allocate` Under 1000 concurrent FOR UPDATE transactions: 1. Each transaction's TableScan triggers cache data allocation (~683KB/block) 2. `EnsureNBytes` was ineffective under contention (TryLock skip) 3. Memory cache could exceed capacity by hundreds of MB 4. Combined with lock-wait holding batch data, total RSS exceeded CN limits Approved by: @XuPeng-SH, @fengttt Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@XuPeng-SH

…th (matrixorigin#24339) Backports the TransferPage write reliability fix to the `main` branch. The root cause of the DN OOM burst under IOChaos is that `WriteTransferPage` silently sets page paths even when the filesystem write fails. When the in-memory hashmap is later evicted by TTL, `loadTable()` attempts to read from a non-existent file path, triggering repeated flush re-scheduling and memory allocation storms. This fix: 1. Adds bounded retry (3 attempts with exponential backoff 100ms/200ms/400ms) to `WriteTransferPage` 2. Returns an error on failure instead of silently succeeding 3. Only sets page paths after a successful write — on failure pages remain in-memory (graceful degradation) 4. Callers (`flushTableTail`, `mergeObjects`) log a warning and keep pages in memory when persistence fails Approved by: @XuPeng-SH, @aptend Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jiangxinmeng1 requested review from LeftHandCold, XuPeng-SH, aunjgr and ouyuanning as code owners May 14, 2026 02:44

jiangxinmeng1 temporarily deployed to ci May 14, 2026 02:44 — with GitHub Actions Inactive

jiangxinmeng1 temporarily deployed to ci May 14, 2026 02:45 — with GitHub Actions Inactive

jiangxinmeng1 had a problem deploying to ci May 14, 2026 02:45 — with GitHub Actions Failure

matrix-meow added the size/S Denotes a PR that changes [10,99] lines label May 14, 2026

mergify Bot added the kind/bug Something isn't working label May 14, 2026

jiangxinmeng1 requested a review from fengttt as a code owner May 14, 2026 07:04

jiangxinmeng1 temporarily deployed to ci May 14, 2026 07:04 — with GitHub Actions Inactive

jiangxinmeng1 had a problem deploying to ci May 14, 2026 07:04 — with GitHub Actions Failure

jiangxinmeng1 temporarily deployed to ci May 14, 2026 07:04 — with GitHub Actions Inactive

matrix-meow added size/M Denotes a PR that changes [100,499] lines and removed size/S Denotes a PR that changes [10,99] lines labels May 14, 2026

jiangxinmeng1 and others added 2 commits May 15, 2026 09:47

jiangxinmeng1 requested a review from aptend as a code owner May 15, 2026 01:47

jiangxinmeng1 requested a deployment to ci May 15, 2026 01:47 — with GitHub Actions In progress

jiangxinmeng1 temporarily deployed to ci May 15, 2026 01:47 — with GitHub Actions Inactive

jiangxinmeng1 requested a deployment to ci May 15, 2026 01:47 — with GitHub Actions In progress

jiangxinmeng1 requested a deployment to ci May 15, 2026 01:48 — with GitHub Actions In progress

jiangxinmeng1 deployed to ci May 15, 2026 01:48 — with GitHub Actions Active

jiangxinmeng1 requested a deployment to ci May 15, 2026 01:48 — with GitHub Actions Queued

matrix-meow added size/L Denotes a PR that changes [500,999] lines and removed size/M Denotes a PR that changes [100,499] lines labels May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reduce RSS growth from dedupLoad sync.Map retention and flush convoy (cherry-pick to 4.0-dev)#24388

fix: reduce RSS growth from dedupLoad sync.Map retention and flush convoy (cherry-pick to 4.0-dev)#24388
jiangxinmeng1 wants to merge 4 commits into
matrixorigin:4.0-devfrom
jiangxinmeng1:fix/dedupload-flush-convoy-4.0-dev

jiangxinmeng1 commented May 14, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiangxinmeng1 commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

Which issue(s) does this PR fix or relate to?

What this PR does / why we need it:

What was the problem

Test plan

Uh oh!

qodo-code-review Bot commented May 14, 2026

Qodo reviews are paused for this user.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiangxinmeng1 commented May 14, 2026 •

edited

Loading