Example demonstrates Schema-Guided Reasoning (SGR) for document classification using structured JSON schemas to force systematic analysis. Inspired by https://abdullin.com/schema-guided-reasoning/examples
This example shows how to use structured schemas to improve document classification accuracy by forcing the LLM to think through the task in predefined steps:
- Identify document type - Forces selection from predefined categories
- Summarize content - Creates mental model of document
- Extract key entities - Identifies business-relevant entities from controlled vocabulary
- Generate keywords - Produces searchable terms for retrieval
The schema acts as a reasoning framework that guides the LLM through systematic analysis rather than jumping directly to classification.
class DocumentClassification(BaseModel):
document_type: Literal["invoice", "contract", "receipt", ...]
brief_summary: str
key_entities_mentioned: List[Literal["payment", "risk", "regulator", ...]]
keywords: List[str] = Field(..., description="Up to 10 keywords")The first two fields (document_type and brief_summary) force the LLM to analyze the document before identifying entities and keywords. This structured thinking improves classification accuracy.
Install:
{
"document_type": "contract",
"brief_summary": "Service agreement between vendor and customer for cloud infrastructure services",
"key_entities_mentioned": ["vendor", "customer", "service", "legal", "financial"],
"keywords": ["cloud", "infrastructure", "SLA", "agreement", "services", "pricing", "terms", "liability"]
}datamatic --config ./config.yaml --verbose