Skip to content

Commit a6c95b6

Browse files
mivertowskiclaude
andcommitted
Add automatic recovery actions for kernel failure handling (Phase 5.2)
Implements automatic recovery infrastructure for handling kernel failures: - RecoveryPolicy enum: Restart, Migrate, Checkpoint, Notify, Escalate, Circuit - FailureType enum: Timeout, Crash, DeviceError, ResourceExhausted, QueueOverflow, StateCorruption - RecoveryConfig with builder pattern and configurable policies per failure type - RecoveryManager for coordinating recovery actions with retry support - RecoveryAction and RecoveryResult types for tracking recovery attempts - RecoveryStatsSnapshot for monitoring recovery statistics Features: - Policy-based recovery with configurable max retries and cooldown periods - Automatic escalation after max retries exceeded - Support for callbacks on recovery completion - Integration with existing health monitoring infrastructure Adds 15 tests for recovery functionality. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 52ec779 commit a6c95b6

2 files changed

Lines changed: 769 additions & 2 deletions

File tree

0 commit comments

Comments
 (0)