perf(raft): WAL 批量 fsync (group commit)#117
Conversation
RaftWAL.AppendLog 此前每条 entry 一次 file.Sync(), follower 端 AppendEntries 处理一整批 N 条时循环调用 → N 次 fsync; rebuild/ SavePersist 同样逐条 fsync。 拆出 writeEntry(只写不 sync), AppendLog = writeEntry + Sync 不变; 新增 AppendLogs(entries): 全部写完只 fsync 一次。rpc.go follower 批量循环及 TruncateLogs/RebuildLogFile/SavePersist 改用 AppendLogs。 durability 契约不变: 整批写入仍在 persistStateLocked/回复成功前落盘。 空批次为 no-op。leader 单条路径 (raft.go AppendEntry) 不变。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 34 minutes and 24 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🐯 BanGD 数据库内核评审整体风险:🟢 低 变更总结:## PR 变更总结 本 PR 在 Raft WAL 的写入路径上实现了 组提交(group commit):将原本每条 entry 一次 fsync 的逐条写入模式,改为批量写入后统一 fsync 一次。核心动作是:
此改动不改变持久化契约(分批写入仍在该 RPC 返回成功前完成 fsync),也不改变磁盘格式或 WAL 记录格式。
架构问题(共 2 项)
普通问题(共 1 项)
本次评审消耗 token:共 85121 tokens(输入 58032,输出 6737,缓存命中 20352,缓存写入 0)|维度 [memory, lock, storage, performance, resource]|补充阅读周边文件 [Raft/raft.go]|对抗式复核 3 票/条,过滤疑似误报 2 条 |
@
背景 (路线图 #111 ①)
排查写密集瓶颈发现:
RaftWAL.AppendLog每条 entry 一次file.Sync()。rpc.goAppendEntries: 一整批 N 条 entry 循环AppendLog→ N 次 fsync (主战场);TruncateLogs/RebuildLogFile/SavePersist: 同样逐条 fsync (冷路径)。改动 (方案 A: 批量 fsync API, 外科手术式)
raft_wal.go: 拆出writeEntry(只写不 sync);AppendLog=writeEntry+Sync(行为不变); 新增AppendLogs(entries)— 全部写完只 fsync 一次, 空批次 no-op。rpc.go: follower 批量循环 →AppendLogs(newEntries)。raft_wal.go:TruncateLogs/RebuildLogFile/SavePersist的逐条 fsync 循环 →AppendLogs。raft.go AppendEntry) 不变 (每次客户端写 1 次 fsync 是固有的)。durability
整批写入仍在
persistStateLocked/ 回复成功前完成 fsync, 持久性契约不变, 仅把同一批的 N 次 fsync 摊销为 1 次。验证
go build ./.../go vet ./Raft/...干净。TestAppendEntriesRPC在干净 main(已 stash 对比) 上同样 panic, 非本 PR 回归。改动逻辑等价且隔离。顺带发现 (另开 issue, 不在本 PR 处理)
rpc.go:161AppendEntries:entry.Index=0 && LastIncludedIndex=0时relativeIndex=-1→r.raft.log[-1]越界 panic (hermetic 下暴露)。🤖 Generated with Claude Code
@