Commit a6c95b6
Add automatic recovery actions for kernel failure handling (Phase 5.2)
Implements automatic recovery infrastructure for handling kernel failures:
- RecoveryPolicy enum: Restart, Migrate, Checkpoint, Notify, Escalate, Circuit
- FailureType enum: Timeout, Crash, DeviceError, ResourceExhausted, QueueOverflow, StateCorruption
- RecoveryConfig with builder pattern and configurable policies per failure type
- RecoveryManager for coordinating recovery actions with retry support
- RecoveryAction and RecoveryResult types for tracking recovery attempts
- RecoveryStatsSnapshot for monitoring recovery statistics
Features:
- Policy-based recovery with configurable max retries and cooldown periods
- Automatic escalation after max retries exceeded
- Support for callbacks on recovery completion
- Integration with existing health monitoring infrastructure
Adds 15 tests for recovery functionality.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>1 parent 52ec779 commit a6c95b6
2 files changed
Lines changed: 769 additions & 2 deletions
0 commit comments