Skip to content

Deterministic Replay Engine for Telemetry Stream Audit Trail with Cryptographic Chaining #56

Description

@elizabetheonoja-art

Problem Statement / Feature Objective

Regulatory compliance requires the ability to replay the entire telemetry stream for any meter over any historical window, producing a bit-identical copy of the original output (tariff evaluations, token mints, settlement records). The current pipeline is non-deterministic due to asynchronous scheduling, variable clock skew between runs, and concurrent processing of events. A deterministic replay engine must be built that records the ingestion stream as an append-only audit log with cryptographic hash chaining (similar to a blockchain), and provides a verifiable replay mode that replays the audit log through the same pipeline stages, ensuring byte-identical output. Any divergence between original and replay output must be detectable via a Merkle-style root hash comparison.

Technical Invariants & Bounds

  • Audit log entry: (stream_offset: u64, event_hash: [u8;32], prev_entry_hash: [u8;32], raw_event_bytes: Vec<u8>). Each entry's hash = SHA256(prev_entry_hash || offset || raw_event_bytes).
  • Cryptographic chain: the audit log forms a hash chain; the root (latest entry's hash) is periodically committed to a Soroban contract every 1000 entries, providing public verifiability.
  • Replay mode: all non-deterministic sources are replaced with deterministic counterparts: tokio::time::pause(), fixed seed for random, clock provided by the audit log's recorded wall_clock_ms, concurrency forced to single-threaded (or deterministic scheduler via loom-style ordering).
  • Replay speed target: 2× real-time for a 24-hour window on a single core.
  • Replay verification: after replay, compute SHA256(all_output_event_hashes_concatenated) and compare against the original run's output root hash stored in the audit log footer. Any mismatch reports the exact offset of the first diverging output.
  • Storage: audit log stored in a dedicated RocksDB column family, exposed for query via gRPC streaming. Retention: 90 days, after which entries are pruned oldest-first.

Codebase Navigation Guide

  • src/audit/logger.rs — new module: AuditLogger that records each ingested event with hash chaining.
  • src/audit/replay.rs — new module: ReplayEngine that reads the audit log and deterministically re-processes events.
  • src/audit/verifier.rs — new module: AuditVerifier comparing original and replay output root hashes.
  • src/ingestion/collector.rs — after successful processing, call audit_logger.record(event).
  • src/storage/rocksdb/audit_log.rs — new module: RocksDB column family for audit log, with prefix seeks by meter ID or time range.
  • src/blockchain/soroban/contracts/audit_root.wat — new Soroban contract storing the latest audit root hash.
  • src/rpc/proto/audit.proto — protobuf for AuditReplayRequest(start_offset, end_offset, meter_id_filter), AuditReplayResponse.
  • tests/audit/deterministic_replay_test.rs — full integration test: run pipeline once, record audit log, replay, compare root hashes.

Implementation Blueprint

  1. In logger.rs, define struct AuditLogger { db: Arc<RocksDB>, cf: ColumnFamily, prev_hash: Mutex<[u8;32]>, metrics: Arc<AuditMetrics> }. Implement fn record(&self, event: &MeterEvent) -> Result<u64> that: computes event_bytes = bincode::serialize(event), reads prev_hash, computes entry_hash = SHA256(prev_hash || stream_offset || event_bytes), writes (stream_offset, event_hash, prev_hash, event_bytes) to the DB, atomically updates prev_hash to entry_hash, and returns stream_offset. Increment a counter for total entries.
  2. Every 1000 entries, log the current stream_offset and prev_hash as a checkpoint. Optionally submit the checkpoint hash to the Soroban audit_root contract for public anchoring. The contract stores (checkpoint_id, root_hash, timestamp).
  3. In replay.rs, define struct ReplayEngine { db: Arc<RocksDB>, pipeline: Arc<Pipeline>, config: ReplayConfig }. Implement async fn replay_range(&self, start_offset: u64, end_offset: u64) -> Result<AuditResult> that: (a) freezes all non-deterministic sources (tokio::time::pause() is called via tokio::time::pause()); (b) installs a deterministic clock that reads timestamps from the audit log entries; (c) reads entries from RocksDB sequentially; (d) feeds each raw event through a single-threaded version of the pipeline; (e) collects output hashes; (f) computes the root hash of all outputs.
  4. For deterministic scheduling, use a custom tokio Runtime configured with basic_scheduler and time::pause(). Events are processed one at a time in offset order, ensuring no parallelism-induced reordering.
  5. In verifier.rs, compare the replayed output root hash against the original root hash stored in the audit log's last checkpoint. If they match, emit AuditResult::Verified. If not, perform a binary search over the offset range to find the first divergence point, returning AuditResult::Divergence(first_bad_offset, original_hash, replay_hash).
  6. Expose replay functionality via a gRPC streaming endpoint ReplayAuditLog(Request) -> stream<ReplayEvent>. The client can request a specific meter ID filter and time range. The server streams each replayed event as it is produced, allowing real-time audit verification.
  7. Write an integration test that: (a) runs the full pipeline with 10,000 events; (b) records the audit log; (c) replays the log; (d) verifies that the output root hash matches. Also test that introducing a single byte mutation in the audit log (simulating tampering) causes the replay to report a divergence at exactly the corrupted offset.

Metadata

Metadata

Labels

Complexity: HardcoreIssues requiring deep systems-level engineering rigorGrantFox OSSIssue tracked in GrantFox OSSLayer: Core-EngineCore engine layerMaybe RewardedIssue may be eligible for a GrantFox rewardOfficial CampaignCampaign: Official CampaignType: Core-ArchitectureCore architecture concerns, invariants, and structural design

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions