📝 Description
Currently, the LLM extraction pipeline in llm.py is entirely stateless. If a user is processing a long voice transcription for a complex 20-field incident report and the local Ollama container crashes, times out, or runs out of memory on field 19, the entire process must be restarted from scratch.
This feature introduces a Checkpointed Pipeline. By caching the progress of self._json locally per field, the system can detect an interrupted session_id upon restart, skip the fields that were already successfully extracted, and resume exactly where it left off.
💡 Rationale
For emergency response tools, forced restarts are a dealbreaker. Running massive transcript context windows through local LLMs (like Mistral/Llama) is computationally expensive and slow. A transient error (like a momentary API connection drop) shouldn't cost the user 5 minutes of re-processing time. Implementing statefulness makes the core extraction engine production-ready, resilient, and significantly more respectful of local compute resources.
🛠️ Proposed Solution
✅ Acceptance Criteria
📌 Additional Context
This directly addresses the underlying fragility of sequential API calls in the backend. It will also serve as a foundational layer if FireForm eventually supports pausing and resuming form-filling via a frontend UI.
📝 Description
Currently, the LLM extraction pipeline in
llm.pyis entirely stateless. If a user is processing a long voice transcription for a complex 20-field incident report and the local Ollama container crashes, times out, or runs out of memory on field 19, the entire process must be restarted from scratch.This feature introduces a Checkpointed Pipeline. By caching the progress of
self._jsonlocally per field, the system can detect an interruptedsession_idupon restart, skip the fields that were already successfully extracted, and resume exactly where it left off.💡 Rationale
For emergency response tools, forced restarts are a dealbreaker. Running massive transcript context windows through local LLMs (like Mistral/Llama) is computationally expensive and slow. A transient error (like a momentary API connection drop) shouldn't cost the user 5 minutes of re-processing time. Implementing statefulness makes the core extraction engine production-ready, resilient, and significantly more respectful of local compute resources.
🛠️ Proposed Solution
session_idor document hash) that updates after every successfuladd_response_to_json()call.main_loop()inllm.pyto check for an existing state file at initialization.self._target_fieldsto exclude keys that already have valid, non-null values in the cached state before beginning the iteration.src/llm.pyandsrc/main.pyrequirements.txt(none needed if using standard libraryjsonorsqlite3)✅ Acceptance Criteria
Ctrl+C) during extraction successfully resumes at the interrupted field upon restart.docs/explaining how session resumption works.📌 Additional Context
This directly addresses the underlying fragility of sequential API calls in the backend. It will also serve as a foundational layer if FireForm eventually supports pausing and resuming form-filling via a frontend UI.