feat: add attestation aggregate coverage metrics#386
Conversation
Port leanSpec PR #735: register three IntGaugeVec metrics describing
attestation aggregate coverage, with default zero-valued series so
dashboards render from a fresh node startup.
- lean_attestation_aggregate_coverage_validators (section, subnet)
- lean_attestation_aggregate_coverage_subnets (section)
- lean_attestation_aggregate_coverage_diff_validators (direction)
ATTESTATION_AGGREGATE_COVERAGE_SECTIONS and
ATTESTATION_AGGREGATE_COVERAGE_DIFF_DIRECTIONS are exported as the
single source of truth for label sets. init() forces the new statics
and seeds combined-subnet, section, and direction series to 0 (18
default series total). Per-subnet (subnet="subnet_N") series appear
lazily when instrumentation writes them.
Registration only. The producer side (per-slot coverage computation,
de-duplication across payloads, and the chain-status log line) ports
zeam #876 and lands in a follow-up PR.
The diff_validators help text diverges from upstreams terse phrasing
to spell out the symmetric-difference semantics (block_only: in block
but not in local timely pool; timely_only: the reverse). Metric name,
labels, and values are unchanged.
🤖 Kimi Code ReviewThis PR adds attestation aggregate coverage metrics following the existing patterns in the codebase. The implementation is correct and idiomatic. Assessment:
Minor observation:
No security vulnerabilities or consensus risks identified—this is pure instrumentation with no state transition logic. Verdict: LGTM. Clean, well-documented addition that follows established patterns. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Codex Code ReviewNo findings. This PR is limited to new Prometheus label constants and gauge registration in metrics.rs and startup seeding in metrics.rs. I don’t see any impact on fork choice, attestation validation, justification/finalization, state transition, XMSS handling, or SSZ behavior. The added code is straightforward, uses the existing One residual note: this patch only registers/seeds the new metrics; there are no writer APIs or call sites in this PR, so the metrics will stay at their seeded values until follow-up instrumentation lands. I also couldn’t complete Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewHere is the full code review: PR #386 —
|
Greptile SummaryRegisters three new
Confidence Score: 5/5This PR is safe to merge — it is purely additive metric registration with no changes to any logic, state, or existing code paths. The change adds three new IntGaugeVec statics and seeds 18 default zero series in init(). No existing code is modified, no consumers of the new metrics exist yet, and the implementation closely mirrors the patterns already established in the file. No files require special attention. The single changed file is straightforward metric registration.
|
| Filename | Overview |
|---|---|
| crates/blockchain/src/metrics.rs | Adds three IntGaugeVec statics and two pub label-set constants; seeds 18 default zero series in init(). Follows all existing patterns; no logic, no consumers yet. |
Reviews (1): Last reviewed commit: "feat: add attestation aggregate coverage..." | Re-trigger Greptile
MegaRedHand
left a comment
There was a problem hiding this comment.
We still need to emit the metrics
Port the producer side of zeam #876 on top of the metrics registered in
the previous commit. After this commit, all 18 coverage series receive
real per-slot updates from chain activity.
Five emission sites:
- accept_new_attestations (store.rs): captures `new_payloads`
participant bits BEFORE promote and stashes them as a
CoverageSnapshot on the Store. Read at the next slot boundary to
populate the `timely` section ("prev_new" in zeam).
- on_block_core (store.rs): mirrors the imported block s per-AttData
aggregation bits into Store::last_block_coverage. Observability-only;
fork choice is unchanged.
- on_tick interval 0 (lib.rs): emits the post-block-merge report for
`slot - 1`. Computes `timely`/`late`/`block`/`combined` from the
stashed snapshots and the current `new_payloads`, then emits the
diff_validators direction counts as the symmetric difference between
`block` and `timely`.
- start_aggregation_session (lib.rs): emits `agg_start_new` from the
current `new_payloads` right before fork-choice aggregation runs at
interval 2.
- propose_block (lib.rs): emits `proposal_payloads`,
`proposal_gossip`, and `proposal_combined` after the block is built.
Each validator set in the block is classified by whether the
AttestationData has a matching known-payload proof.
New module crates/blockchain/src/coverage.rs holds the Coverage type
(seen + has_subnet bitsets, derived subnet via vid % committee_count to
match the gossip subnet assignment) plus the 3 emission helpers and 6
unit tests covering add_bits, merge_from, diff_counts, empty/zero/out-
of-range edge cases.
Storage gets a CoverageSnapshot type and two Arc<Mutex<Option<…>>>
fields on Store. No proofs are duplicated — only AggregationBits are
captured, keeping the per-slot allocation in the tens of bytes per
entry. The pre-merge capture happens inside accept_new_attestations
just before promote_new_aggregated_payloads, so consumer-side timing
concerns stay in the existing tick path.
BlockChain::spawn now takes attestation_committee_count as a
parameter; bin/ethlambda/src/main.rs already resolves the value
(CLI > validator-config.yaml > 1) and passes it through. The number
of attestation committees was previously only known to P2P (for
subnet subscriptions); the coverage emitters need it to derive
subnet ids.
|
Instrumentation added in 855f56d. Five emission sites now write to the 18 series registered in e9a04f7:
Store gets a small
Ready for another look. |
🗒️ Description / Motivation
Ports leanSpec PR #735 to ethlambda: registers three Prometheus metrics that describe attestation aggregate coverage, with default zero-valued series so dashboards render from a fresh node startup.
leanSpec PR #735 itself mirrors blockblaz/zeam#898. Per upstream, this PR is registration only — the producer side (per-slot coverage computation, de-duplication across payloads, and the chain-status log line) is the equivalent of blockblaz/zeam#876 and lands in a follow-up.
What Changed
crates/blockchain/src/metrics.rs(+76 / -0):pub const &[&str]label-set constants — single source of truth for sections and directions, mirroringATTESTATION_AGGREGATE_COVERAGE_SECTIONSandATTESTATION_AGGREGATE_COVERAGE_DIFF_DIRECTIONSin leanSpec.IntGaugeVecstatics:lean_attestation_aggregate_coverage_validators— labels:section,subnet.subnet="combined"is the section total;subnet="subnet_N"is per-subnet coverage.lean_attestation_aggregate_coverage_subnets— label:section. Count of covered subnets per section.lean_attestation_aggregate_coverage_diff_validators— label:direction. Counts of validators in the symmetric difference between block-included aggregates and locally-aggregated pre-merge (timely) aggregates for the same slot.init()forces the new statics and seeds 18 default zero-valued series: 8 sections ×subnet="combined", 8 sections, and 2 directions. Per-subnet (subnet="subnet_N") series appear lazily when instrumentation writes them.Sections
timely,late,block,combined,agg_start_new,proposal_payloads,proposal_gossip,proposal_combined.Directions
block_only,timely_only.Notes
IntGaugeVec(notGaugeVec): all coverage values are integer counts, and every other labeled gauge inmetrics.rsusesIntGaugeVec(LEAN_NODE_INFO,LEAN_TABLE_BYTES,LEAN_NODE_SYNC_STATUS).diff_validatorshelp text intentionally diverges from upstream's terse phrasing ("Validator coverage delta between block payloads and timely pre-merge payloads") to spell out the symmetric-difference semantics:block_only= in block but not in local timely pool;timely_only= the reverse. Metric name, labels, and values are unchanged, so dashboards built against any client's schema are unaffected.Operator interpretation of
diff_validatorsThe aggregation pipeline produces two pools for the same slot:
timely(locally aggregated pre-merge) andblock(what the proposer included). The diff metric counts validators in the symmetric difference:block_onlypersistently high → this node was slow to receive/aggregate via gossip; proposer had a better view.timely_onlypersistently high → proposer omitted attestations the network had time to gossip.Correctness / Behavior Guarantees
init()exactly like the existing gauges, so they appear at/metricsfrom node startup.8 × 65 = 520series — well within Prometheus comfort.Tests Added / Run
No new tests in this PR. The upstream Python test (
test_attestation_aggregate_coverage_metrics_registered) is tautological for code that calls.set(0), and the blockchain crate doesn't currently host a metric-registry test harness; introducing one for this is out of proportion. Verified via the existing suite + manual/metricssmoke check.Commands run:
cargo fmt --all -- --check— cleanmake lint(clippy-D warnings) — cleancargo test -p ethlambda-blockchain --release --lib --bins— 20 passedcargo test -p ethlambda-blockchain --release --test signature_spectests— 7 passedcargo test --workspace --release --exclude ethlambda-blockchain— all passingcargo test -p ethlambda-blockchain --release --test forkchoice_spectests— 62 passed, 8 failed, allAttestationTooFarInFuturefrom pre-existing fixture flakes onmain(no logic changes in this PR could affect attestation timing).Related Issues / PRs
✅ Verification Checklist
make fmt— cleanmake lint(clippy with-D warnings) — cleancargo test --workspace --release— passing modulo 8 pre-existing forkchoice fixture flakes documented above (unrelated to this PR)