Add gov proposal based migration trigger#3650
Conversation
PR SummaryHigh Risk Overview Auto write mode by default: Integration/tests: FlatKV migrate script drives rate via a param-change proposal; gov module test asserts param updates and migration start on auto clusters. Reviewed by Cursor Bugbot for commit 293fe78. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3650 +/- ##
==========================================
- Coverage 59.24% 58.79% -0.45%
==========================================
Files 2272 2223 -49
Lines 188175 183105 -5070
==========================================
- Hits 111479 107655 -3824
+ Misses 66657 65840 -817
+ Partials 10039 9610 -429
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| // composite SC backend drives the in-flight memiavl->flatkv migration that | ||
| // BeginBlock paces via the migration gov param. Fail fast if the legacy | ||
| // root multistore is somehow in use. | ||
| rs, ok := app.CommitMultiStore().(*rootmulti.Store) |
There was a problem hiding this comment.
If we need to cast like this, would it make sense to change the type returned by app.CommitMultiStore()?
| genesisImportConfig genesistypes.GenesisImportConfig | ||
|
|
||
| stateStore seidb.StateStore | ||
| rs *rootmulti.Store |
There was a problem hiding this comment.
nit, would it make sense to use a more descriptive name like rootStore? Out of context, it's not obvious what rs stands for.
There was a problem hiding this comment.
good suggestion, will change
| if migrationBatchSize < 0 { | ||
| return nil, fmt.Errorf("migration batch size must not be negative, got %d", migrationBatchSize) |
There was a problem hiding this comment.
Could we make migration batch size an unsigned integer?
There was a problem hiding this comment.
Yeah, that's a good point
There was a problem hiding this comment.
Tried that, looks like it would require a big refactory, will actually do a fallback here to 0 if it's negative
There was a problem hiding this comment.
A well-structured, thoroughly-tested change that moves the SC migration rate from a node-local config to a consensus-deterministic governance param (migration.NumKeysToMigratePerBlock), correctly read once per block in BeginBlock and pushed down to the composite SC store. No blocking correctness/security issues found; a few non-blocking notes, chiefly that the new subspace value is not genesis-exportable.
Findings: 0 blocking | 6 non-blocking | 2 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Migration param is not genesis-exportable (confirms Codex P2).
x/paramsExportGenesis(sei-cosmos/x/params/keeper/genesis.go) only exportsFeesParams/CosmosGasParams, and the newmigrationsubspace has no owning module's InitGenesis/ExportGenesis. After aseid export/import while a migration is in flight,NumKeysToMigratePerBlockis dropped and BeginBlock re-seeds the0(paused) default, halting the drain until governance re-submits a proposal. Data isn't lost (the boundary cursor lives in flatkv) and it's recoverable, but worth either adding genesis plumbing for this subspace or documenting the operational caveat. - Operational behavior change worth calling out in release notes: because the param defaults to
0(paused) and is the sole source of the rate, any chain already running in a migrate write-mode will pause its drain at the upgrade height until a param-change proposal raises the rate. Intentional per the consensus-safety design, but operators must know to submit the proposal. - REVIEW_GUIDELINES.md and cursor-review.md were empty/absent, so no repo-specific guidelines or Cursor second opinion were available; only the Codex pass (one P2, addressed above) contributed.
- Integration test nit (integration_test/gov_module/gov_proposal_test.yaml): the verifier
NEW_PARAM == 12345compares an unquoted numeric against a value the pipeline produces as a string (jq -r .value | tr -d "\""), unlike the sibling assertionNEW_ABCI_PARAM == "true". If the eval engine is type-strict this could mis-compare; quoting"12345"would match the established pattern. - 2 suggestion(s)/nit(s) flagged inline on specific lines.
| // composite SC backend drives the in-flight memiavl->flatkv migration that | ||
| // BeginBlock paces via the migration gov param. Fail fast if the legacy | ||
| // root multistore is somehow in use. | ||
| rs, ok := app.CommitMultiStore().(*rootmulti.Store) |
There was a problem hiding this comment.
[suggestion] This unconditional type assertion now panics in New() if the commit multistore is not *rootmulti.Store. New() is invoked by every seid subcommand that builds the app (export, query tooling, etc.), so please confirm no supported configuration (e.g. a legacy IAVL/storev2-disabled path) can reach here — otherwise this turns a previously-degraded path into a hard crash. If the legacy store is truly unsupported this is fine; a one-line note on why it's unreachable would help.
| } | ||
| numKeys := migration.DefaultNumKeysToMigratePerBlock | ||
| if subspace, ok := app.ParamsKeeper.GetSubspace(migration.SubspaceName); ok { | ||
| // The migration subspace has no owning module to seed it in InitGenesis, |
There was a problem hiding this comment.
[suggestion] The lazy seed correctly handles a fresh/never-set param deterministically, but because this subspace has no module ExportGenesis (x/params only exports Fees/CosmosGas params), the value is lost across a seid export/import: a mid-migration rate resets to the 0 default here and the drain pauses until governance re-raises it. Consider adding genesis import/export plumbing for this subspace, or documenting the caveat.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 91e1766. Configure here.
There was a problem hiding this comment.
The PR cleanly moves the SC migration rate from a node-local config to a consensus-safe governance param, with thorough plumbing and tests. However, an accidental rename of the receipt-store backend constant breaks an existing test (and decouples the test harness from the real config key), which blocks merge.
Findings: 1 blocking | 4 non-blocking | 2 posted inline
Blockers
- None at the file/PR level.
- 1 blocking issue(s) flagged inline on specific lines.
Non-blocking
- Cursor's second-opinion pass produced no output (cursor-review.md is empty); REVIEW_GUIDELINES.md is also empty, so no repo-specific standards could be applied.
- validateNumKeysToMigratePerBlock accepts any uint64. Because the value is cast to int before NextBatch (migration_manager.go:299), a governance value above math.MaxInt64 becomes negative and halts commit deterministically. Consider bounding the param to a sane operational maximum.
- app/abci_test.go's comment that math.MaxUint64 is 'forwarded verbatim with no conversion or clamping' is misleading — the app layer stores it verbatim, but migration_manager converts it to int downstream.
- 1 suggestion(s)/nit(s) flagged inline on specific lines.
|
|
||
| const ( | ||
| receiptStoreBackendKey = "receipt-store.rs-backend" | ||
| receiptStoreBackendKey = "receipt-store.rootStore-backend" |
There was a problem hiding this comment.
[blocker] This key must stay receipt-store.rs-backend. The parser in sei-db/config/receipt_config.go reads flagRSBackend = "receipt-store.rs-backend" (and the rs-backend mapstructure tag), so renaming this constant to receipt-store.rootStore-backend decouples the app-layer key from what ReadReceiptConfig actually reads. Concretely, TestParseReceiptConfigs_RejectsInvalidBackend (app/seidb_test.go) now sets "rocksdb" under a key the parser ignores, the config falls back to the default pebbledb, and assert.Error fails — breaking CI. This looks like collateral damage from a global rs→rootStore rename (the comment at app/test_helpers.go:486 was hit too). Revert to receipt-store.rs-backend.
| if advanceMigration { | ||
| // Get the next batch of keys to migrate. | ||
| valuesToMigrate, newBoundary, err := m.iterator.NextBatch(m.migrationBatchSize) | ||
| valuesToMigrate, newBoundary, err := m.iterator.NextBatch(int(m.migrationBatchSize)) |
There was a problem hiding this comment.
[suggestion] int(m.migrationBatchSize) truncates a uint64 gov param. On 64-bit, a value above math.MaxInt64 becomes negative, NextBatch rejects size <= 0, and ApplyChangeSets returns an error during commit — a deterministic chain halt from a governance typo. Since validateNumKeysToMigratePerBlock accepts any uint64, consider bounding the param to a sane maximum (well under math.MaxInt) at validation time.
This reverts commit 91e1766.
There was a problem hiding this comment.
Converts the SC migration per-block rate from a consensus-unsafe node-local config into a deterministic governance param read in BeginBlock; the design is sound, AppHash-safe (all nodes read the same value, lazy-default-set runs identically), and very thoroughly tested at unit, store, and integration levels. Only minor, non-blocking observations.
Findings: 0 blocking | 5 non-blocking | 1 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- New() now hard-panics if the commit multistore is not *rootmulti.Store (app/app.go:560). This is intentional fail-fast, but it is a behavior change: any App-construction path that could use a non-storev2 store (offline/CLI commands, tooling) would now crash rather than degrade. Worth confirming every code path that builds App goes through the storev2 rootmulti store.
- The two later casts in app.go (~line 770 and ~2685) still re-derive the store via app.CommitMultiStore().(*rootmulti.Store) instead of reusing the new app.rootStore field; minor redundancy now that the type is guaranteed at construction.
- Cursor's second-opinion review file (cursor-review.md) was empty — that pass produced no output. Codex reported no material issues.
- Consider a brief note/log when the lazy default-set writes the param the first time (app/abci.go), purely for operator visibility on the upgrade block; not required for correctness.
- 1 suggestion(s)/nit(s) flagged inline on specific lines.
Superseded: latest AI review found no blocking issues.
There was a problem hiding this comment.
The plumbing of the per-block migration rate into a governance param is clean, well-documented, and well-tested, but the PR also flips the default WriteMode to migrate_evm and relies on the claim that "paused migrate_evm produces the same app hash as memiavl_only" — which the routing code contradicts: brand-new EVM keys are written to flatkv yet excluded from the AppHash while paused, creating a fork/integrity risk. REVIEW_GUIDELINES.md and cursor-review.md were both empty; only Codex provided a second opinion, and its two P1 findings are corroborated.
Findings: 5 blocking | 4 non-blocking | 3 posted inline
Blockers
- Default WriteMode + migration trigger are not consistently network-wide. The default flips to migrate_evm (sc_config.go) but sc-write-mode remains node-local (app/seidb.go:109 only overrides when the toml key is non-empty). A node that picks up the new default runs migrate_evm while a peer with an explicit
sc-write-mode = memiavl_onlystays memiavl_only; once the gov param is raised, only the migrate_evm nodes drain. Because the paused-migrate_evm AppHash is NOT equivalent to memiavl_only (see inline finding), this mixed fleet can fork even before the migration is triggered. The PR's framing of the gov param as the 'sole' migration trigger overstates this — the write mode is still a consensus-relevant local switch. (Corroborates Codex P1 #2.) - Missing test coverage for the central safety claim: there is no test asserting that a paused migrate_evm store and a memiavl_only store produce identical AppHashes when brand-new EVM keys are written each block. Existing migration tests all set a positive batch size and drive the migration; the steady-state/paused path with new-key writes (the exact rollout state introduced by the default change) is unverified.
- 3 blocking issue(s) flagged inline on specific lines.
Non-blocking
- REVIEW_GUIDELINES.md is empty/missing, so no repo-specific standards were applied. cursor-review.md is empty — the Cursor pass produced no output. Only the Codex pass produced findings (both corroborated here).
- app/app.go: the fail-fast panic message says
expected *storev2_rootmulti.Storebut the import alias was changed torootmultiin this PR — cosmetic mismatch only. - integration_test/gov_module/gov_proposal_test.yaml: the verifier
NEW_PARAM == 12345compares a string (fromjq -r .value | tr -d '"') against an integer literal, whereas the sibling test uses a quoted string (NEW_ABCI_PARAM == "true"). Confirm the eval framework coerces types here, otherwise the assertion may not behave as intended. There is also a strayq gov proposal ... statusquery inserted between the two node votes — harmless but reads oddly. - app/abci.go applyMigrationBatchSize lazily persists the default param via subspace.Set on the first block it sees it unset. This is deterministic across nodes, but it means BeginBlock writes to the params store at the first post-upgrade height; worth a brief note that this is intended and idempotent.
| // all_migrated_but_bank, migrate_bank, flatkv_only, test_only_dual_write, auto. | ||
| // defaults to memiavl_only. | ||
| // defaults to migrate_evm. While the NumKeysToMigratePerBlock gov param is 0 | ||
| // (the default), migrate_evm is paused and produces the same app hash as |
There was a problem hiding this comment.
[blocker] This claim is not borne out by the routing code. In migrate_evm, MigrationManager.shouldForwardWriteToNewDB routes any caller write whose key is absent from memiavl directly to flatkv (migration_manager.go:431-435), and EVM creates brand-new keys (storage slots, accounts, receipts) every block. While paused (batch size 0) the boundary never advances, so shouldAppendLatticeHash returns false (composite/store.go:870-882) and flatkv is excluded from CommitInfo/AppHash. Result: paused migrate_evm commits new EVM state into flatkv that is outside the AppHash, whereas memiavl_only puts that same key into memiavl which IS in the AppHash — so the two AppHashes differ. This breaks the equivalence the rollout depends on (fork/halt risk in a mixed fleet) and leaves committed state uncovered by the merkle root. Please either route brand-new keys to memiavl while paused, or do not enable migrate_evm by default until the param is raised.
| return StateCommitConfig{ | ||
| Enable: true, | ||
| WriteMode: types.MemiavlOnly, | ||
| WriteMode: types.MigrateEVM, |
There was a problem hiding this comment.
[blocker] Changing the default WriteMode from MemiavlOnly to MigrateEVM is a consensus-relevant default flip. Combined with the node-local sc-write-mode override (app/seidb.go:109) and the AppHash non-equivalence of paused migrate_evm (see comment above), nodes that adopt this default will diverge from peers still on memiavl_only. Please confirm the upgrade story: existing nodes with an explicit memiavl_only in app.toml keep that value, but any node without it now silently enters migrate_evm. This needs to be coordinated as a network-wide switch rather than a per-node default.
| // still route below, but no keys are pulled forward and no boundary | ||
| // metadata is rewritten, so the migration holds at its current cursor | ||
| // until the batch size is raised again. | ||
| advanceMigration := firstBatchInBlock && m.migrationBatchSize > 0 |
There was a problem hiding this comment.
[blocker] advanceMigration gates only the boundary advance and metadata writes; caller-write routing below (shouldForwardWriteToNewDB) still runs unconditionally, sending brand-new keys to flatkv even when paused. Since the boundary metadata is never written in the paused state, shouldAppendLatticeHash never latches, so those flatkv writes never enter the AppHash. This is the mechanism behind the AppHash divergence flagged on sc_config.go and matches Codex P1 #1. Consider also pausing the new-key forwarding (route everything to the old DB) while migrationBatchSize == 0, so paused migrate_evm is truly AppHash-identical to memiavl_only.
There was a problem hiding this comment.
The PR cleanly replaces the node-local sc-keys-to-migrate-per-block config with a consensus-deterministic governance param and is well-tested, but the migration kick-off still depends on each node's local sc-write-mode: a node that keeps a fixed memiavl_only config (the value existing app.toml files already carry after upgrade) cannot honor the gov trigger, its failed kick-off is only logged, and it will diverge from Auto-mode nodes once the param is raised. That residual per-node divergence axis plus the silent error handling are blocking for a consensus path.
Findings: 4 blocking | 3 non-blocking | 2 posted inline
Blockers
- Upgrade/operational fork risk:
DefaultStateCommitConfig.WriteModechanges frommemiavl_onlytoauto, but already-running validators keepsc-write-mode = "memiavl_only"baked into their app.toml (rendered from the old default). The default change does not rewrite their config, so after upgrade many nodes will be fixedmemiavl_only. When the gov param is later raised, Auto nodes migrate (changing the memiavl/flatkv contents that feed the AppHash) while fixedmemiavl_onlynodes silently fail the kick-off and never migrate — i.e. the chain forks. The PR's stated goal is to eliminate exactly this per-node divergence, but it introduces a new one (Auto vs fixed). At minimum this needs to be enforced (refuse to start / fail loud if a non-Auto store is in use while migration is/was requested) or made an explicit, hard upgrade prerequisite, not just relied upon operationally. - REVIEW_GUIDELINES.md is empty and codex-review.md is empty/absent, so no repo-specific review standards and no OpenAI Codex second-opinion pass were available for this synthesis (noting per instructions; not itself a code defect).
- 2 blocking issue(s) flagged inline on specific lines.
Non-blocking
- app.go
New()now unconditionallypanics ifCommitMultiStore()is not*rootmulti.Store. This is intentional fail-fast, but confirm no supported code path (export/CLI tooling, tests, legacy-store startup) constructs the App with a different commit multistore, since this turns a previously-guardedif okinto a hard panic. - Cursor P2 (ordering):
applyMigrationBatchSizeruns afterforkInitializerandHardForkManager.ExecuteForTargetHeight. The kick-off'sSetWriteModerejects when any commitment storeHasPendingChanges(). This appears safe because BeginBlock writes are buffered in the cache-multistore and only flushed into the commitment stores at Commit (soHasPendingChanges()should be false here), but it's worth confirming no hard-fork handler writes directly into a commitment store's pending buffer in the same block a positive param first applies. MigrationManager.SetMigrationBatchSizewritesm.migrationBatchSizewithout atomics; this is safe only because the manager is always wrapped bythreadSafeRouter(write lock serializes it againstApplyChangeSets). Fine as-is, but the invariant is load-bearing and worth a brief assertion/test that migration routers are never used unwrapped.
| return fmt.Errorf("failed to set SC store migration batch size: %w", err) | ||
| } | ||
| if batchSize > 0 { | ||
| if mode, ok := rs.GetWriteMode(); ok && mode == sctypes.MemiavlOnly { |
There was a problem hiding this comment.
[blocker] The kick-off fires whenever the effective mode is MemiavlOnly, but a composite store under a fixed memiavl_only config also reports GetWriteMode() == MemiavlOnly (see composite store.go:428-429 / resolveCurrentWriteMode). So for such a node this calls SetWriteMode(MigrateEVM), which returns the "write mode is fixed by configuration" error. Combined with the log-only handling in BeginBlock, that node never migrates while Auto nodes do — an AppHash divergence. The doc comment above ("every other SC store ... never reports a memiavl_only effective mode, so the kick-off is skipped") is inaccurate for fixed memiavl_only. Either special-case/skip non-Auto stores here, or treat the "fixed" failure as fatal so the divergence can't go unnoticed.
There was a problem hiding this comment.
For fixed MemiavlOnly node, we intentionally want to diverge the app hash and skip the migration. So the switch should only work for auto mode
| numKeys = uint64(math.MaxInt64) | ||
| } | ||
| if err := app.rootStore.SetMigrationBatchSize(int(numKeys)); err != nil { | ||
| logger.Error("failed to set SC migration batch size", "err", err) |
There was a problem hiding this comment.
[blocker] A failure to apply the consensus-relevant migration rate is only logged and execution continues. The level-triggered retry justifies swallowing transient failures (e.g. pending changes), but a permanent failure — chiefly a node pinned to a fixed sc-write-mode retained across upgrade — will silently fork from nodes that do migrate, one log line per block. Recommend distinguishing the non-retryable case (e.g. the "write mode is fixed" error) and halting/panicking, or validating at startup that the store is Auto when migration may be driven by governance.
| - type: eval | ||
| expr: NEW_ABCI_PARAM == "true" | ||
|
|
||
| - name: Test migration param change proposal should update NumKeysToMigratePerBlock |
There was a problem hiding this comment.
it looks the migration gov tests triggers a real evm migration but only asserts the param value, not the migration. want to make sure it's on purpose
There was a problem hiding this comment.
There's no good way to check if real migration started or not, I decided to add a check to look for logs signals
There was a problem hiding this comment.
A clean, well-tested refactor that moves the SC migration rate from a node-local config to a consensus-deterministic gov param and makes auto the default write mode. No correctness blockers found; the main thing to flag is the operational rollout implication of the default-mode change. Codex and Cursor produced no findings.
Findings: 0 blocking | 5 non-blocking | 1 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Rollout coordination: the default
sc-write-modeflips frommemiavl_onlytoauto. Existing validators that already have an explicitsc-write-mode = "memiavl_only"in app.toml are fixed (not auto), so they will skip the kick-off and intentionally diverge/halt at the height the gov proposal raises NumKeysToMigratePerBlock. The release/runbook should make it explicit that every participating validator must setautobefore the param-change proposal passes; otherwise lagging nodes fork. This is by design and documented in code, but it is a critical operational precondition rather than an automatic outcome of the upgrade. - Second-opinion passes: OpenAI Codex reported "No material findings" and the Cursor review file was empty (no output). REVIEW_GUIDELINES.md was also empty, so no repo-specific standards were applied.
- applyMigrationBatchSize lazily persists the default (0) into the params store on the first block it sees the key unset. This is deterministic across nodes and effectively one-time (subsequent blocks see
Has==true), so it's fine — just noting the write-in-BeginBlock side effect for awareness. - Verified the integration-test verifier
NEW_PARAM == 12345(bare number vs string env var) works: integration_test/runner/runner.go compares via ParseFloat, and staking_proposal_test.yaml already uses the same bare-number form — so this is not a bug. - 1 suggestion(s)/nit(s) flagged inline on specific lines.
| return StateCommitConfig{ | ||
| Enable: true, | ||
| WriteMode: types.MemiavlOnly, | ||
| WriteMode: types.Auto, |
There was a problem hiding this comment.
[suggestion] Flipping the default write mode from memiavl_only to auto is the consensus-relevant part of this change. Note that this only governs nodes whose app.toml has an empty/absent sc-write-mode; nodes that already wrote sc-write-mode = "memiavl_only" (the previous template default) remain fixed memiavl_only and will be excluded from the kick-off in rootmulti.SetMigrationBatchSize — they'll intentionally diverge once the gov param is raised. Worth calling out in the upgrade runbook that all participating validators must switch to auto before the param-change proposal lands. Not a code defect — the behavior is documented and tested — just a rollout precondition that's easy to miss.
There was a problem hiding this comment.
Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.
Once credits are available, reopen this pull request to trigger a review.
There was a problem hiding this comment.
A well-engineered, heavily-tested refactor that moves the SC migration rate from a node-local config to a consensus-deterministic gov param. Two consensus/rollout concerns should be resolved before merge: existing validators are pinned to fixed memiavl_only (so the gov "trigger" is a no-op for them), and BeginBlock lazily writes a new param key into committed state, which is state-machine-breaking despite the "non-app-hash-breaking" label.
Findings: 4 blocking | 5 non-blocking | 1 posted inline
Blockers
- (Confirmed; agrees with Codex) The gov param is a no-op for the existing fleet. The app.toml template (
sei-db/config/toml.go) always renderssc-write-mode = "{{ .StateCommit.WriteMode }}", andparseSCConfigs(app/seidb.go) preserves any explicit value. Nodes provisioned under the oldmemiavl_onlydefault therefore have an explicitsc-write-mode = "memiavl_only"and are pinned (fixed), notauto.Store.SetMigrationBatchSizeonly kicks offmigrate_evmwhenConfiguredWriteMode() == Autoand explicitly skips fixedmemiavl_only. So raisingmigration.NumKeysToMigratePerBlockwill NOT start the migration on those validators — contradicting the PR's core claim that the gov param is the sole trigger. The migration would still require a separate, coordinated app.toml edit (setsc-write-mode = auto) on every node first; if only some operators do so, the auto nodes migrate (AppHash includes migration writes) while pinned nodes do not, diverging the chain. Please document/automate the operator step to flip existing nodes toauto, and gate the activation so partial adoption can't fork the chain. - BeginBlock writes new committed state, which is state-machine-breaking and at odds with the PR's
non-app-hash-breakinglabel.applyMigrationBatchSizecallssubspace.Set(...)the first time the param is unset, adding a key to the x/params store and thus changing the AppHash at the first block the new binary runs. On a rolling (uncoordinated) upgrade, an upgraded node would compute a different AppHash than peers still on the old binary and halt. Confirm this ships behind a coordinated upgrade height (or seed the param via an upgrade handler / InitGenesis instead of lazily in BeginBlock), and reconcile thenon-app-hash-breakinglabel. - 1 blocking issue(s) flagged inline on specific lines.
- 1 blocking issue(s) listed below under unanchored comments.
Non-blocking
- Cursor produced no review output (
cursor-review.mdis empty);REVIEW_GUIDELINES.mdis also empty, so this review applies general standards only. integration_test/gov_module/gov_proposal_test.yaml: the new verifierexpr: NEW_PARAM == 12345compares against an unquoted numeric literal, whereas the sibling verifier in the same file uses a quoted string (NEW_ABCI_PARAM == "true").NEW_PARAMis captured fromjq -r .value | tr -d '"'(a string). If the eval engine does not coerce string-vs-number, this assertion could silently never hold (false pass risk) or flake. RecommendNEW_PARAM == "12345"for consistency.Store.SetMigrationBatchSize: when the SC store does not exposeConfiguredWriteMode(bool=false),configuredis the empty string, which!= Auto, so the code logs the 'pinned to fixed memiavl_only' message even though the mode is simply unknown. Harmless in practice (the composite store always implements it) but the log could mislead during debugging.SetWriteModeno longer rejects pending uncommitted changes and no longer rebuildsrs.ckvStores— a meaningful change to a consensus-critical path. The rationale (dynamic router proxies + avoiding orphaning the live deliver cms) and the new regression testTestRootMultiAutoKickoff_LiveCacheStoreNotOrphanedlook sound, but this path warrants careful reviewer attention.- Tests could not be executed in the review sandbox (no network for the Go 1.25.6 toolchain), matching Codex's note; CI should be relied on for the race/coverage run.
Comments that couldn't be anchored to the diff
sei-db/state_db/sc/composite/store.go:1264-- [blocker] This skip path is the crux of the rollout gap: a node configured with a fixed (non-auto) write mode — including the explicitmemiavl_onlythat the app.toml template writes for every existing fleet member — is excluded from the migration kick-off. Combined withparseSCConfigspreserving explicitsc-write-mode, raising the gov param will not start migration on the current fleet without a separate coordinated config flip toauto. Worth surfacing this required operator action prominently and guarding against partial adoption (which would fork auto vs pinned nodes).
| if app.HardForkManager.TargetHeightReached(ctx) { | ||
| app.HardForkManager.ExecuteForTargetHeight(ctx) | ||
| } | ||
| app.applyMigrationBatchSize(ctx) |
There was a problem hiding this comment.
[blocker] applyMigrationBatchSize writes to committed state in BeginBlock: when the param is unset it calls subspace.Set(...), adding a key to the x/params store and changing the AppHash at the first block the new binary runs. That is state-machine-breaking and would diverge nodes on a rolling/uncoordinated upgrade, which seems at odds with the PR's non-app-hash-breaking label. Consider seeding the param via an upgrade handler / InitGenesis instead of lazily here, or confirm this ships at a coordinated upgrade height.
* main: feat(seid): ConfigManager selection seam (PLT-775 PR1) (#3671) fix(evmrpc): limit listener max open connections, configurable via max_open_connections (PLT-704) (#3637) LittDB: Keymap threading improvements (#3645) integrate hashlogger (#3647) fix(metrics): Prometheus metrics output (#3640) [codex] Harden multiversion iterator validation (#3656) feat(consensus): mock_chain_validation replay build + memIAVL state-sync restore fixes (#3663) chore: replace OLD red SeiLogo banner in README with new 2026 Sei lockup (#3670) Require absolute path for evmone lib (#3668) fix(evmrpc): apply getLogs maxLog cap during merge instead of after (PLT-687) (#3666) feat(evmrpc): pre-decode request size admission control (PLT-295) (#3648) Make autobahn block production check wait for progress (#3667) fix(sei-tendermint): prevent readRoutine goroutine leak on /websocket when writeChan is full (PLT-707) (#3664) Per-block littidx flush + single shard (gated on #3645) (#3660) fix(evmrpc): bound debug_traceStateAccess memory and add trace admission control (PLT-360) (#3653) [codex] bump go-ethereum to v1.15.7-sei-17 (#3657) Upodate checkout GHA step across all workflows (#3659) Add GoReleaser release pipeline for static seid binaries (#3425) Parallelize littidx eth_getLogs across blocks (#3652)
There was a problem hiding this comment.
The PR cleanly moves the SC migration rate from a node-local config to a consensus-read governance param, with strong test coverage and careful determinism reasoning. The one blocking concern (shared with Codex) is that the new sc-write-mode-enable-auto defaults to true and, when absent, unconditionally forces WriteMode to auto — silently overriding any explicit sc-write-mode (notably flatkv_only) and breaking nodes whose operators don't add the opt-out before upgrade.
Findings: 2 blocking | 5 non-blocking | 2 posted inline
Blockers
- Silent override of explicit write modes on upgrade (agrees with Codex High). When
sc-write-mode-enable-autois absent, both parse paths forceWriteMode = autoregardless of the explicitsc-write-mode. The design intent (memiavl_only fleet auto-participates) is sound, but this also clobbers explicitflatkv_only/test_only_dual_write/ pinned configs. toml.go's own comment warns thatautoon a flatkv_only node 'either fails every commit with a version-mismatch error or silently serves reads from an empty memiavl' — so flatkv_only nodes that merely upgrade the binary (existing docker scripts only rewritesc-write-mode, not the new key) will break unless operators proactively addsc-write-mode-enable-auto = false. Recommend either (a) only forceautowhen the explicit mode is empty ormemiavl_only, honoring any other explicit mode as a deliberate pin, or (b) gate this behind an explicit release-note/migration step and a startup safety check. As-is this is a silent, app-hash-affecting footgun. - 1 blocking issue(s) flagged inline on specific lines.
Non-blocking
sei-db/config/sc_config.gostill declaresKeysToMigratePerBlock int(line 72) and sets it to 1024 inDefaultStateCommitConfig(line 90), but nothing reads it anymore (buildRouter now uses the atomicmigrationBatchSize). The PR description claims the field was dropped; it's now dead config. Remove the field (and its mapstructure tag) to match the stated intent and avoid confusing operators who still havesc-keys-to-migrate-per-blockin app.toml (now silently ignored).- Behavior diverges between the two parse paths:
app/seidb.goskips parsing/validatingsc-write-modeentirely when auto is on (default), so a typo'dsc-write-modeis silently ignored, whereassei-cosmos/server/config/config.gostill parses (and can error on) the explicit mode before overriding to auto. Consider making the two consistent. - REVIEW_GUIDELINES.md (base branch) is empty — no repo-specific standards were available to apply. cursor-review.md is also empty (Cursor produced no output); only the Codex pass contributed a finding.
- Doc-only caveat already noted by the author: a
seid exporttaken mid-migration omits NumKeysToMigratePerBlock (no owning module's ExportGenesis serializes it), so a chain bootstrapped from that genesis re-seeds 0 (paused). Acceptable and documented, but ensure operators forking/recovering via export are aware they must re-issue the proposal. - 1 suggestion(s)/nit(s) flagged inline on specific lines.
| if v := appOpts.Get(FlagSCWriteModeEnableAuto); v != nil { | ||
| scConfig.WriteModeEnableAuto = cast.ToBool(v) | ||
| } | ||
| if scConfig.WriteModeEnableAuto { |
There was a problem hiding this comment.
[blocker] Blocker: when sc-write-mode-enable-auto is absent (the common case for already-provisioned nodes), WriteModeEnableAuto stays true and this unconditionally forces WriteMode = auto, discarding any explicit sc-write-mode. For memiavl_only that's the intended upgrade path, but for an explicit flatkv_only node this is dangerous — toml.go's own comment states auto on a flatkv_only node 'either fails every commit with a version-mismatch error or silently serves reads from an empty memiavl.' Existing startup scripts only rewrite sc-write-mode, so such operators get no signal to add the opt-out and the node breaks on upgrade. Safer: only force auto when the explicit mode is empty or memiavl_only; treat any other explicit mode as a deliberate pin and honor it. The same logic in sei-cosmos/server/config/config.go (~line 357) needs the same treatment.
| if app.rootStore == nil { | ||
| return | ||
| } | ||
| numKeys := migration.DefaultNumKeysToMigratePerBlock |
There was a problem hiding this comment.
[nit] Minor: this lazily writes the default param into the params store during BeginBlock the first time it's unset. It's deterministic across nodes and correctly gated by subspace.Has, so it's fine — but note that the very first ParameterChangeProposal can only be accepted after at least one block has run to seed the key. On a brand-new chain that's block 1+, which is fine; just worth a one-line comment so this ordering dependency isn't surprising later.
There was a problem hiding this comment.
This PR converts the consensus-relevant SC migration rate from a node-local config into a governance param read every BeginBlock, removing the divergence risk of per-node rates. The change is well-factored and unusually well-tested (unit coverage for the auto kick-off, restart/resume, clamping, and cross-block timing); no correctness blockers found, but there are a couple of non-blocking coverage/limitation notes and minor test nits.
Findings: 0 blocking | 7 non-blocking | 2 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Genesis export/import limitation (raised by Codex, and explicitly documented in app/migration/params.go): the
migrationsubspace has no owning AppModule, so its value is not serialized byseid export. A chain bootstrapped from such a genesis re-seeds the default 0 (paused) on the first BeginBlock, silently dropping a previously-approved rate until a new ParameterChangeProposal is issued. This is recoverable and not consensus-fatal, and the team documented it — but consider wiring genesis export/import (or a dedicated module) so operators forking/recovering via export don't lose the rate unknowingly. - Integration coverage gap (raised by Codex): the headline path — a gov param raising NumKeysToMigratePerBlock auto-triggering memiavl_only -> migrate_evm — is only covered by Go unit tests. verify_flatkv_evm_migrate.sh still seds sc-write-mode = "migrate_evm" locally per node (with sc-write-mode-enable-auto pinned false by GIGA_MIGRATE_FROM_MEMIAVL), so the E2E run exercises a fixed migrate_evm node plus the gov-controlled rate, not the auto kick-off transition. Consider an integration scenario that leaves auto enabled and lets the gov param alone drive the transition.
- Cursor produced no review output (cursor-review.md is empty); REVIEW_GUIDELINES.md is also empty, so no repo-specific standards were applied.
- No prompt-injection or other untrusted-instruction content was found in the PR title/description or diff.
- Informational: app.New() now panics if the commit multistore is not *rootmulti.Store. This is safe in practice — SetupSeiDB always installs rootmulti.NewStore and already panics when SC is disabled — so the new assertion is purely defensive, not a behavior regression.
- 2 suggestion(s)/nit(s) flagged inline on specific lines.
| expr: PROPOSAL_STATUS == "PROPOSAL_STATUS_PASSED" | ||
| # The migration batch size param must reflect the new value | ||
| - type: eval | ||
| expr: NEW_PARAM == 12345 |
There was a problem hiding this comment.
[suggestion] NEW_PARAM == 12345 compares a string (the value is piped through tr -d "\"", yielding the string "12345") against a numeric literal. The sibling verifier on line 72 quotes its expected value (PROPOSAL_STATUS == "PROPOSAL_STATUS_PASSED"), and the pre-existing ABCI-param test uses == "true". Depending on the eval engine's type coercion this comparison may never match (string-vs-number), making the verifier pass vacuously or fail. Suggest NEW_PARAM == "12345" for consistency and to ensure the assertion actually exercises the new value.
| - cmd: seidbin=seid; chainid=sei; source integration_test/utils/_tx_helpers.sh && submit_tx_and_wait node_admin gov vote $PROPOSAL_ID yes --fees 2000usei | ||
| node: sei-node-0 | ||
| # sei-node-1 vote yes | ||
| - cmd: seid q gov proposal $PROPOSAL_ID --output json | jq -r .status |
There was a problem hiding this comment.
[nit] This stray seid q gov proposal status query sits under the # sei-node-1 vote yes comment but does nothing (no env: capture, no node:), so the comment now mislabels two steps. Either drop this line or move it below the node-1 vote and capture its output, so the test reads cleanly.
Superseded: latest AI review found no blocking issues.
There was a problem hiding this comment.
Converting the per-block SC migration rate from a node-local config into a consensus-read governance param (migration.NumKeysToMigratePerBlock) is the right fix for the fork risk, and the change is unusually well-documented and heavily tested across unit, integration, and end-to-end layers. One edge-case correctness consideration (raised by Codex) around the newly-legal migrate_evm + batch-size-0 state is worth confirming before merge, but the intended auto-driven path is safe.
Findings: 0 blocking | 5 non-blocking | 1 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Codex flagged (High): making batch size 0 a legal 'paused' state newly allows a store to sit in effective
migrate_evmwithmigrationBatchSize == 0. In that stateadvanceMigrationis false so no boundary metadata is written, yet caller writes for brand-new EVM keys still route to FlatKV viashouldForwardWriteToNewDB;composite.shouldAppendLatticeHashonly includes the FlatKV lattice in the AppHash oncemigrationStartedis true, so those writes would be excluded from the AppHash. Worth confirming this is unreachable in practice: under the intended auto flow it is safe (the kick-off inrootmulti.SetMigrationBatchSizeonly entersmigrate_evmwhen batch>0, and that same block advances the boundary and latches the lattice; lowering the gov param to 0 afterward is safe because the lattice is already latched). The reachable case is a node pinned to fixedsc-write-mode = migrate_evmwhile the gov rate is still at its 0 default — previously impossible because the removed config validation forced batch>0 (default 1024). Since the PR documents pinned nodes as deliberately divergent this is likely acceptable, but a brief note or a guard (e.g. reject/kick-off on fixed migrate_evm with rate 0) would remove the footgun. - Cursor's second-opinion review file (cursor-review.md) was empty — that pass produced no findings to merge.
app/abci.go:applyMigrationBatchSizelazily persists the default (0) into the params subspace on the first BeginBlock that finds the key unset (!subspace.Has→subspace.Set). This is a state write in BeginBlock; it is deterministic across nodes so it is consensus-safe, but note it makes the migration subspace's first appearance an app-hash-affecting event at the activation height — consistent with theapp-hash-breakinglabel, just worth being explicit about in the upgrade notes.- Documentation caveat (already self-noted in
app/migration/params.go): because the migration subspace has no owning module,seid exportomitsNumKeysToMigratePerBlock, so a chain bootstrapped from a mid-migration export silently re-seeds the paused default. Good that it's documented; consider surfacing it in operator-facing upgrade docs too. - 1 suggestion(s)/nit(s) flagged inline on specific lines.
There was a problem hiding this comment.
Solidly-engineered change that moves the SC memiavl→flatkv migration rate from a node-local config to a governance param and makes auto the default write mode, with thorough tests and unusually detailed rationale comments. No correctness blockers found; the main risk is the "auto by default" upgrade path silently overriding explicitly-pinned non-memiavl write modes, which is documented but has no runtime guard.
Findings: 0 blocking | 5 non-blocking | 1 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Cursor second-opinion review (cursor-review.md) was empty — that pass produced no output.
- Genesis-export gap (documented in app/migration/params.go): the
migrationsubspace has no owning AppModule, soseid exportomitsNumKeysToMigratePerBlock; a chain bootstrapped from such a genesis silently re-seeds the default (0/paused) and operators must re-issue the proposal. Consider seeding it via the params ExportGenesis path or documenting it in operator runbooks so a mid-migration export doesn't silently reset the drain. - WriteMode resolution is now computed in two places — sei-cosmos/server/config/config.go:GetConfig and app/seidb.go:parseSCConfigs — both calling ApplyWriteModeAuto. They agree today; consider a shared helper or a test asserting parity so the two parse paths can't drift and produce a config divergence.
- Codex finding #2 (app/abci.go applyMigrationBatchSize logs-and-continues on SetMigrationBatchSize failure, which can include the consensus-critical memiavl_only→migrate_evm kick-off): the author's in-code rationale is sound — a partial failure diverges AppHash and Tendermint halts the affected minority, while an all-fail keeps AppHash identical and the level-triggered kick-off re-fires. Acceptable as-is; flagging that Codex raised it and it was consciously resolved.
- 1 suggestion(s)/nit(s) flagged inline on specific lines.

Summary
The state-commitment (SC) store's in-flight
memiavl → flatkvmigration was previously paced by a node-local config (sc-keys-to-migrate-per-block). Because migration writes feed the AppHash, a per-node rate is consensus-relevant and a divergence risk: two validators draining at different rates fork the chain.This PR makes the per-block migration rate a module-agnostic governance parameter (
migration.NumKeysToMigratePerBlock) that every validator reads from chain state eachBeginBlockand applies identically. The gov param also serves as the migration trigger: it defaults to0(paused), and raising it via a param-change proposal starts the drain fleet-wide at a deterministic height.The old node-local config and its fallback are removed entirely — the gov param is now the sole source of the rate.
Key changes
New generic governance parameter
app/migration/params.go— new module-agnosticmigrationparams subspace definingNumKeysToMigratePerBlock(default0), itsKeyTable, and validation. Deliberately not EVM-specific so future module migrations can reuse it.app/app.go— registers themigrationsubspace; stores the*rootmulti.Storeon the app and fails fast if the (unsupported) legacy commit multistore is in use.app/abci.go—BeginBlockreadsNumKeysToMigratePerBlockfrom chain state and pushes it into the SC store before the block's first write (applyMigrationBatchSize).Plumbing: push the rate down to the migration router
sei-db/state_db/sc/types/types.go—Committerinterface gainsSetMigrationBatchSize(int) error.sei-cosmos/storev2/rootmulti/store.go— forwardsSetMigrationBatchSizeto the SC store; addsGetMigrationBatchSizefor observability/tests.sei-db/state_db/sc/composite/store.go— holds the gov-set batch size (atomic.Int64);buildRouterandSetMigrationBatchSizeuse it directly. Removed theeffectiveMigrationBatchSizeconfig fallback —0means paused, full stop.sei-db/state_db/sc/migration/*—Routerinterface gainsSetMigrationBatchSize; all router types implement it. Only the migration manager acts on it; every other router (module/passthrough/dual-write/thread-safe) treats it as a no-op.router_builder.gonow allows a0batch size (paused) instead of rejecting it.Removed the node-local config + fallback
sei-db/config/sc_config.go,sei-db/config/toml.go,docker/localnode/config/app.toml,app/seidb.go— dropped thesc-keys-to-migrate-per-blockfield, default, validation, TOML entry, and flag parsing.Tests
app/migration/params_test.go— unit tests for the new param (default, key-table registration, validation).app/abci_test.go— subspace registration,applyMigrationBatchSize(defaults/set/clamp), and cross-block "param set in block N takes effect in N+1" coverage.SetMigrationBatchSize(...)after construction instead of the removed config field. The random-migration framework re-applies the rate on every store (re)open, faithfully mirroring how production re-pushes the gov param after each restart.Integration test
integration_test/gov_module/— newParameterChangeProposaltest that raisesNumKeysToMigratePerBlockand asserts it takes effect.integration_test/contracts/verify_flatkv_evm_migrate.sh— rewritten to drive the migration via a governance param-change proposal (submit → deposit → quorum vote → poll forPASSED→ verify on-chain) instead of injecting the removed config.Docs
AGENTS.md— Code style now mandates running bothgofmtandgoimportson every touched.gofile.