An agent that ingests text documents, extracts structured data, and outputs typed JSON.
This repository demonstrates processing unstructured data (like client briefs or meeting notes) into structured formats required by downstream operational systems.
- Context Chunking: Safely splits large texts at paragraph boundaries to respect context windows.
- Structured Extraction: Forces the LLM to output strict JSON matching a predefined schema.
- Merge & Deduplicate: Combines partial extractions from multiple chunks into a single coherent output.
- Validation Gate: Validates the final JSON against required fields and data types before outputting.
- Install dependencies:
pip install -r requirements.txt - Set your API key:
export ANTHROPIC_API_KEY='your-api-key'
Pass a text file path, or run without arguments to process the built-in demo document:
python agent.py path/to/document.txt