|
| 1 | +# Codebase Architecture Redesign |
| 2 | + |
| 3 | +Date: 2026-03-09 |
| 4 | +Status: Approved |
| 5 | + |
| 6 | +## Goal |
| 7 | + |
| 8 | +Define an ideal end-state architecture for the Video RSS Aggregator that improves |
| 9 | +maintainability, reliability, delivery velocity, and testability. |
| 10 | + |
| 11 | +## Context |
| 12 | + |
| 13 | +The current codebase works, but several design pressures now slow it down: |
| 14 | + |
| 15 | +- `Pipeline` mixes composition, orchestration, feed ingestion, processing, |
| 16 | + persistence, runtime reporting, and RSS generation. |
| 17 | +- Data contracts leak across layers, especially between summarization, |
| 18 | + persistence, and RSS rendering. |
| 19 | +- CLI and API do not consistently share one application-level workflow model. |
| 20 | +- Setup/runtime data is shaped in multiple places across Python, HTML, and JS. |
| 21 | +- Tests rely heavily on monkeypatching concrete modules instead of stable seams. |
| 22 | + |
| 23 | +## Chosen Approach |
| 24 | + |
| 25 | +Adopt a ports-and-use-cases architecture with four layers: |
| 26 | + |
| 27 | +1. Adapters |
| 28 | +2. Application |
| 29 | +3. Domain |
| 30 | +4. Infrastructure |
| 31 | + |
| 32 | +This was selected over a lighter workflow-slice refactor or a stabilized version |
| 33 | +of the current layering because the goal is an ideal long-term architecture, |
| 34 | +not only a lower-risk cleanup. |
| 35 | + |
| 36 | +## Architectural Principles |
| 37 | + |
| 38 | +- Dependencies point inward only. |
| 39 | +- Business workflows live in application use cases, not transport adapters. |
| 40 | +- Domain types are stable and independent of storage, HTTP, CLI, and model APIs. |
| 41 | +- Infrastructure implements ports instead of owning business rules. |
| 42 | +- Composition happens once in a single composition root. |
| 43 | + |
| 44 | +## Target Architecture |
| 45 | + |
| 46 | +### Adapters |
| 47 | + |
| 48 | +Adapters translate external interactions into application requests and map |
| 49 | +application responses back out. |
| 50 | + |
| 51 | +- FastAPI routes |
| 52 | +- CLI commands |
| 53 | +- GUI/setup endpoints and page models |
| 54 | +- RSS HTTP delivery surface |
| 55 | + |
| 56 | +Adapters should not contain orchestration logic beyond request mapping, |
| 57 | +validation, and response formatting. |
| 58 | + |
| 59 | +### Application |
| 60 | + |
| 61 | +Application use cases coordinate business workflows through explicit ports. |
| 62 | + |
| 63 | +Core use cases: |
| 64 | + |
| 65 | +- `BootstrapRuntime` |
| 66 | +- `GetRuntimeStatus` |
| 67 | +- `IngestFeed` |
| 68 | +- `ProcessSource` |
| 69 | +- `RenderRssFeed` |
| 70 | + |
| 71 | +The current `Pipeline` class should be replaced by these focused use cases. |
| 72 | + |
| 73 | +### Domain |
| 74 | + |
| 75 | +Domain types define the stable language of the system. |
| 76 | + |
| 77 | +Illustrative types: |
| 78 | + |
| 79 | +- `SourceItem` |
| 80 | +- `VideoRecord` |
| 81 | +- `Transcript` |
| 82 | +- `PreparedMedia` |
| 83 | +- `SummaryDraft` |
| 84 | +- `SummaryResult` |
| 85 | +- `ProcessOutcome` |
| 86 | +- `DiagnosticReport` |
| 87 | + |
| 88 | +These types must not depend on SQLite rows, Ollama payloads, FastAPI models, |
| 89 | +Click commands, or subprocess output. |
| 90 | + |
| 91 | +### Infrastructure |
| 92 | + |
| 93 | +Infrastructure adapters implement application ports. |
| 94 | + |
| 95 | +- SQLite repositories |
| 96 | +- Ollama client adapter |
| 97 | +- Feed fetching adapter |
| 98 | +- Media preparation adapter around yt-dlp, ffmpeg, and filesystem artifacts |
| 99 | +- RSS rendering adapter |
| 100 | +- Runtime inspection adapter |
| 101 | + |
| 102 | +Infrastructure owns transport and tool integration details, but not workflow |
| 103 | +decisions. |
| 104 | + |
| 105 | +## Ports |
| 106 | + |
| 107 | +The application layer should depend on explicit interfaces such as: |
| 108 | + |
| 109 | +- `FeedSource` |
| 110 | +- `VideoRepository` |
| 111 | +- `SummaryRepository` |
| 112 | +- `MediaPreparationService` |
| 113 | +- `Summarizer` |
| 114 | +- `RuntimeInspector` |
| 115 | +- `PublicationRenderer` |
| 116 | +- `ArtifactStore` |
| 117 | + |
| 118 | +This creates narrow seams for tests and keeps adapters replaceable. |
| 119 | + |
| 120 | +## Data Flow |
| 121 | + |
| 122 | +### Startup |
| 123 | + |
| 124 | +The composition root loads `Config`, builds infrastructure adapters, wires use |
| 125 | +cases, and exposes them to FastAPI and CLI entry points. Web startup and |
| 126 | +shutdown should be owned by FastAPI lifespan rather than by a prebuilt runtime |
| 127 | +object created externally. |
| 128 | + |
| 129 | +### Ingest |
| 130 | + |
| 131 | +`IngestFeed` fetches and parses a feed, normalizes entries into domain types, |
| 132 | +stores feed/video metadata, and optionally delegates processing to |
| 133 | +`ProcessSource`. It should not own processing internals. |
| 134 | + |
| 135 | +### Processing |
| 136 | + |
| 137 | +`ProcessSource` asks `MediaPreparationService` for `PreparedMedia`, passes the |
| 138 | +result to `Summarizer`, then persists a typed `ProcessOutcome`. |
| 139 | + |
| 140 | +This use case owns the decision about whether a result is successful, degraded, |
| 141 | +or failed. |
| 142 | + |
| 143 | +### Publication |
| 144 | + |
| 145 | +`RenderRssFeed` reads published summaries through repositories and passes stable |
| 146 | +publication models to a renderer. RSS generation should not depend on storage |
| 147 | +row types. |
| 148 | + |
| 149 | +### Setup and Runtime |
| 150 | + |
| 151 | +`BootstrapRuntime` and `GetRuntimeStatus` should return one application-level |
| 152 | +view model shared by API and GUI, replacing duplicated config/setup shaping. |
| 153 | + |
| 154 | +## Error Handling Model |
| 155 | + |
| 156 | +Replace implicit fallback-heavy success semantics with explicit outcome types: |
| 157 | + |
| 158 | +- `Success` |
| 159 | +- `PartialSuccess` |
| 160 | +- `Failure` |
| 161 | + |
| 162 | +Rules: |
| 163 | + |
| 164 | +- Adapter-specific exceptions are translated at the use-case boundary. |
| 165 | +- `PartialSuccess` is used when the system produced a degraded but valid result. |
| 166 | +- `Failure` means the business goal was not achieved and must not be presented |
| 167 | + as a normal success. |
| 168 | +- Persistence records outcome status explicitly rather than relying on summary |
| 169 | + text to reveal degradation. |
| 170 | +- API and CLI surfaces report status directly. |
| 171 | + |
| 172 | +Diagnostics remain separate from processing outcomes. |
| 173 | + |
| 174 | +## Testing Strategy |
| 175 | + |
| 176 | +The main regression net should move to application-boundary contract tests for: |
| 177 | + |
| 178 | +- `BootstrapRuntime` |
| 179 | +- `GetRuntimeStatus` |
| 180 | +- `IngestFeed` |
| 181 | +- `ProcessSource` |
| 182 | +- `RenderRssFeed` |
| 183 | + |
| 184 | +Supporting tests should be split into: |
| 185 | + |
| 186 | +- repository integration tests |
| 187 | +- Ollama adapter tests |
| 188 | +- media adapter tests |
| 189 | +- FastAPI adapter tests |
| 190 | +- CLI adapter tests |
| 191 | +- policy tests for model selection, degradation classification, normalization, |
| 192 | + and retention/publication rules |
| 193 | + |
| 194 | +The desired end state is that most business behavior can be tested without |
| 195 | +FastAPI, Click, SQLite, subprocesses, or a live Ollama runtime. |
| 196 | + |
| 197 | +## Non-Goals |
| 198 | + |
| 199 | +- Defining the full migration sequence in this document |
| 200 | +- Implementing the redesign directly from adapters inward |
| 201 | +- Preserving the current `Pipeline` shape as a compatibility constraint |
| 202 | + |
| 203 | +## Expected Benefits |
| 204 | + |
| 205 | +- clearer ownership of business workflows |
| 206 | +- lower coupling between persistence, summarization, and presentation |
| 207 | +- consistent behavior across API and CLI |
| 208 | +- easier unit and contract testing |
| 209 | +- safer future changes to model policy, media tooling, and publishing |
| 210 | + |
| 211 | +## Next Step |
| 212 | + |
| 213 | +Create a dedicated implementation plan that stages the migration from the |
| 214 | +current codebase into this target architecture. |
0 commit comments