|
| 1 | +# FUSION v1 Survivability Extensions Documentation |
| 2 | + |
| 3 | +This directory contains the complete specification for implementing survivability and offline RL capabilities in the FUSION Elastic Optical Network simulator. |
| 4 | + |
| 5 | +## Document Organization |
| 6 | + |
| 7 | +The specification is organized into **7 logical phases** to facilitate incremental development and verification: |
| 8 | + |
| 9 | +### Phase 1: Foundation & Setup |
| 10 | +Understanding project context, scope boundaries, and development workflow. |
| 11 | + |
| 12 | +- [00-overview.md](phase1-foundation/00-overview.md) - Project Context & Integration Points |
| 13 | +- [01-scope-boundaries.md](phase1-foundation/01-scope-boundaries.md) - SHALL NOT, Nice-to-Have, Out of Scope |
| 14 | +- [02-module-summary.md](phase1-foundation/02-module-summary.md) - Module-by-Module Summary |
| 15 | +- [03-version-control.md](phase1-foundation/03-version-control.md) - Git Workflow & Branching Strategy |
| 16 | + |
| 17 | +### Phase 2: Core Infrastructure |
| 18 | +Building the foundational components for failure handling and path management. |
| 19 | + |
| 20 | +- [10-failure-module.md](phase2-infrastructure/10-failure-module.md) - Failure/Disaster Module (F1, F3, F4) |
| 21 | +- [11-k-path-cache.md](phase2-infrastructure/11-k-path-cache.md) - K-Path Candidate Generation & Caching |
| 22 | +- [12-configuration.md](phase2-infrastructure/12-configuration.md) - Configuration System Integration |
| 23 | +- [13-determinism-seeds.md](phase2-infrastructure/13-determinism-seeds.md) - Determinism & Seed Management |
| 24 | + |
| 25 | +### Phase 3: Protection & Recovery |
| 26 | +Implementing protection mechanisms and recovery time modeling. |
| 27 | + |
| 28 | +- [20-protection.md](phase3-protection/20-protection.md) - 1+1 Disjoint Protection + Restoration |
| 29 | +- [21-recovery-timing.md](phase3-protection/21-recovery-timing.md) - Recovery Time Modeling (Emulated SDN) |
| 30 | + |
| 31 | +### Phase 4: RL Integration |
| 32 | +Adding reinforcement learning policy support and dataset generation. |
| 33 | + |
| 34 | +- [30-rl-policies.md](phase4-rl-integration/30-rl-policies.md) - RL Policy Integration (Offline Inference) |
| 35 | +- [31-dataset-logging.md](phase4-rl-integration/31-dataset-logging.md) - Offline Dataset Logging |
| 36 | + |
| 37 | +### Phase 5: Metrics & Reporting |
| 38 | +Implementing comprehensive metrics collection and reporting. |
| 39 | + |
| 40 | +- [40-metrics-reporting.md](phase5-metrics/40-metrics-reporting.md) - Metrics & Reporting System |
| 41 | + |
| 42 | +### Phase 6: Quality Assurance |
| 43 | +Ensuring code quality, test coverage, and performance standards. |
| 44 | + |
| 45 | +- [50-testing.md](phase6-quality/50-testing.md) - Testing Requirements & Standards |
| 46 | +- [51-documentation.md](phase6-quality/51-documentation.md) - Documentation Requirements |
| 47 | +- [52-performance.md](phase6-quality/52-performance.md) - Performance Budgets & Constraints |
| 48 | + |
| 49 | +### Phase 7: Project Management |
| 50 | +Project planning, risk management, and traceability. |
| 51 | + |
| 52 | +- [60-work-breakdown.md](phase7-management/60-work-breakdown.md) - Minimal Work Breakdown (13-17 days) |
| 53 | +- [61-risks-mitigations.md](phase7-management/61-risks-mitigations.md) - Risks & Mitigations |
| 54 | +- [62-traceability.md](phase7-management/62-traceability.md) - Traceability to Paper Claims |
| 55 | +- [63-usage-workflow.md](phase7-management/63-usage-workflow.md) - Example Usage Workflow |
| 56 | +- [64-checklist.md](phase7-management/64-checklist.md) - Final Implementation Checklist |
| 57 | + |
| 58 | +## High-Level Goals |
| 59 | + |
| 60 | +Enable stress-testing KSP-FF, 1+1 protection, and an **offline RL policy (BC → IQL)** with **action masking + heuristic fallback** under **F1 (link), F3 (SRLG), F4 (geo radius=2)** failures, measuring: |
| 61 | + |
| 62 | +- **Blocking Probability (BP)** overall and within failure windows |
| 63 | +- **Recovery Time** (mean, P95) for protection and restoration |
| 64 | +- **Fragmentation** proxy metrics |
| 65 | +- **Seed Variance** for statistical significance |
| 66 | + |
| 67 | +## Development Workflow |
| 68 | + |
| 69 | +1. **Start with Phase 1** to understand context and scope |
| 70 | +2. **Follow Phases 2-5** for implementation (order matters due to dependencies) |
| 71 | +3. **Use Phase 6** throughout development for quality checks |
| 72 | +4. **Refer to Phase 7** for project management and tracking |
| 73 | + |
| 74 | +## Estimated Timeline |
| 75 | + |
| 76 | +**Total: 13-17 days** (see [60-work-breakdown.md](phase7-management/60-work-breakdown.md)) |
| 77 | + |
| 78 | +## Prerequisites |
| 79 | + |
| 80 | +- FUSION v6.0.0+ installed and configured |
| 81 | +- Python 3.9+, PyTorch, NetworkX, Stable-Baselines3 |
| 82 | +- Familiarity with FUSION's architecture (see [00-overview.md](phase1-foundation/00-overview.md)) |
| 83 | + |
| 84 | +## Quick Start |
| 85 | + |
| 86 | +```bash |
| 87 | +# 1. Review foundation documents |
| 88 | +cd docs/survivability-v1/phase1-foundation |
| 89 | +cat 00-overview.md 01-scope-boundaries.md |
| 90 | + |
| 91 | +# 2. Begin implementation with failures module |
| 92 | +cd ../../fusion/modules |
| 93 | +# Follow phase2-infrastructure/10-failure-module.md |
| 94 | + |
| 95 | +# 3. Run tests as you implement |
| 96 | +pytest fusion/modules/failures/tests/ -v --cov |
| 97 | + |
| 98 | +# 4. Refer to quality assurance docs |
| 99 | +cd ../../docs/survivability-v1/phase6-quality |
| 100 | +``` |
| 101 | + |
| 102 | +## Key Architecture Principles |
| 103 | + |
| 104 | +1. **Minimal Invasiveness**: Extend existing modules, don't replace them |
| 105 | +2. **Registry Pattern**: Use FUSION's registry system for multi-component modules |
| 106 | +3. **Type Safety**: Full type hints on all functions and parameters |
| 107 | +4. **Test Coverage**: 80-90% target for all new modules |
| 108 | +5. **Determinism**: All experiments fully reproducible with seed control |
| 109 | + |
| 110 | +## Contact & Support |
| 111 | + |
| 112 | +For questions about this specification: |
| 113 | +- Review FUSION's [CODING_STANDARDS.md](../../CODING_STANDARDS.md) |
| 114 | +- Check [TESTING_STANDARDS.md](../../TESTING_STANDARDS.md) |
| 115 | +- Refer to [DEVELOPMENT_WORKFLOW.md](../../DEVELOPMENT_WORKFLOW.md) |
| 116 | + |
| 117 | +--- |
| 118 | + |
| 119 | +**Version**: v2 (Contextualized to FUSION Architecture) |
| 120 | +**Last Updated**: 2025-10-14 |
| 121 | +**Status**: Specification Complete, Implementation Pending |
0 commit comments