Skip to content

[BREAKING & DNM] Consensus > PoS: remove epoch-1 init hack#18699

Draft
glyh wants to merge 3 commits intolyh/nit-improvement-posfrom
lyh/store-epoch-number-on-disk
Draft

[BREAKING & DNM] Consensus > PoS: remove epoch-1 init hack#18699
glyh wants to merge 3 commits intolyh/nit-improvement-posfrom
lyh/store-epoch-number-on-disk

Conversation

@glyh
Copy link
Copy Markdown
Member

@glyh glyh commented Mar 31, 2026

Background

A previous fix addressed incorrect staking ledger selection on restart in epoch 1 by loading the next epoch ledger first and inferring the staking fallback from its provenance (genesis vs disk). This worked but was fragile and marked as a hack.

Changes

  • Store epoch number on disk (epoch_ledger_uuids): adds an epoch field so initialization can reason directly about the persisted state's epoch rather than inferring it indirectly.

  • Replace the hack with explicit case analysis on (epoch, disk backing presence):

    • Epoch 0 & no disk ledgers: use genesis as local state
    • Epoch 1 & only next on disk: use genesis next for local staking, load local next
    • Epoch ≥ 2 & both on disk: load both
    • Anything else: log error, clean up, reset to genesis

@github-project-automation github-project-automation bot moved this to To triage in Mesa Triage Mar 31, 2026
@glyh glyh force-pushed the lyh/store-epoch-number-on-disk branch 2 times, most recently from 7bb3396 to 0443399 Compare March 31, 2026 09:35
glyh added 2 commits April 1, 2026 11:16
…resentations

Replace the flat `epoch_ledger_uuids` record with a variant `Epoch_ledgers` type
(`Epoch_zero | Epoch_one | Epoch_two_or_more`) that makes it impossible to express
invalid states (e.g. a staking UUID with no next UUID in early epochs). On startup,
if the persisted metadata is missing or belongs to a different genesis, reset cleanly
to genesis rather than crashing or using stale data.

Co-located changes:
- Embed `config` inside `Ledger_snapshot.Ledger_root` so callers no longer need to
  reconstruct it separately when removing a snapshot
- Derive yojson for epoch ledger metadata via ppx instead of hand-rolled converters
- Rename snapshot identifiers: `Staking_epoch_snapshot`/`Next_epoch_snapshot` -> `Staking`/`Next`
- Pass full ledger + config in `required_snapshot_sync` so `sync_local_state` no longer
  needs to derive the ledger from the snapshot kind at sync time
@glyh glyh force-pushed the lyh/store-epoch-number-on-disk branch from f795705 to 388d682 Compare April 1, 2026 03:16
@glyh
Copy link
Copy Markdown
Member Author

glyh commented Apr 7, 2026

This grows much more complex than what I intended to do in the beginning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To triage

Development

Successfully merging this pull request may close these issues.

1 participant