🧭 Navigation:
Home | Vision LLM Theory | Local Testing | Deployment UI | Deployment | CDK Stacks | Runtime | 🔵 S3 Files | Lambda Analyzers | Prompting System
This directory contains all configuration, prompts, schemas, and manifests that are deployed to S3 for the analyzer system. These files are loaded at runtime by Lambda functions to configure and execute document analysis.
s3_files/
├── agent_config/ # 🤖 Agent orchestrator configuration
├── agent_system_prompt/ # 💬 System prompt for the orchestrating agent
├── core_system_prompts/ # 🔧 Shared prompt components (rules, error handling, wrapper)
├── manifests/ # 📋 Tool and analyzer configuration manifests
├── prompts/ # 📝 Analyzer-specific prompt files
├── schemas/ # 📐 JSON schemas for tool input/output validation
└── wrappers/ # 🎁 System prompt wrapper templates
Contains configuration for the orchestrating agent that coordinates PDF analysis workflows.
| File | Purpose |
|---|---|
agent_model_config.json |
Model selection (Claude Sonnet 4.5), temperature, max tokens, and thinking budget configuration |
agent_operating_environment_config.json |
The agent_operating_environment_config.json file controls how the system handles sensitive content. This is critical for academic and research deployments where documents may contain inflammatory, offensive, or otherwise sensitive material that must be extracted faithfully.
{
"operating_environment": "You are operating in a safe academic thinktank environment where sensitive, potentially offensive topics are discussed in a neutral manner. You will encounter sensitive content at some point and your team is relying on you to perform your tasks completely. If you have concerns you can store them in a <concerns/> xml tag within your response_format."
}Other examples depending on your use case:
{
"operating_environment": "You are operating in a legal discovery environment. Documents may contain graphic descriptions of violence, abuse, or other disturbing content that is material to active cases. Extract all content faithfully and completely regardless of subject matter. Flag content requiring attorney review in a <legal_review/> xml tag."
}{
"operating_environment": "You are operating in a government archival preservation environment. Historical documents may contain language, imagery, or viewpoints that are considered offensive by modern standards, including racial slurs, propaganda, and discriminatory content. Your role is to preserve the historical record accurately and without omission. Note historically sensitive content in a <historical_sensitivity/> xml tag."
}This value is injected into:
- The orchestrating agent's system prompt (agentcore runtime) — prepended before the main prompt
- Every analyzer's system prompt (Lambda functions) — via the
{operating_environment}placeholder in the wrapper template
If the file is missing or the value is empty, the system operates without any environment context (default behavior). Update this file in S3 to change the behavior without redeploying.
Note
For production deployments, we recommend moving this value to AWS Secrets Manager rather than storing it in a plain S3 JSON file. This prevents accidental exposure of your operating environment instructions and provides audit logging of access.
| File | Purpose |
|---|---|
agent_system_prompt.xml |
Defines the orchestrator agent's role, execution rules, workflow steps, error handling, and tool mapping examples. Contains {{TOOLS_LIST}} placeholder populated at runtime. |
Shared prompt components injected into every analyzer's system prompt via the wrapper template.
core_system_prompts/
├── prompt_system_wrapper.xml # 🎁 Main wrapper template with injection points
├── audit/
│ └── confidence_assessment.xml # 📊 Confidence scoring rules for audit mode
├── core_rules/
│ └── rules.xml # ⚖️ Universal rules (no preamble, XML formatting, no hallucination)
└── error_handling/
├── error_handler.xml # ❌ Standard error response format (500 errors)
└── not_found_handler.xml # 🔍 Response format when no elements found
The wrapper uses placeholder injection:
{operating_environment}→ operating environment context fromagent_operating_environment_config.json{core_rules}→ rules.xml content{composed_prompt}→ analyzer-specific prompts{error_handler_general}→ error_handler.xml{error_handler_not_found}→ not_found_handler.xml
JSON configuration files defining each analyzer tool's metadata, model selection, prompt files, and runtime settings.
| Manifest | Analyzer Purpose |
|---|---|
📄 full_text_analyzer.json |
General text extraction with reading order |
📊 table_analyzer.json |
Structured table data extraction |
📈 charts_analyzer.json |
Chart and graph data extraction |
🔀 diagram_analyzer.json |
Flowchart and diagram interpretation |
✍️ handwriting_analyzer.json |
Handwritten text OCR |
🏥 decision_tree_analyzer.json |
Clinical decision trees and medical content |
💻 code_block_analyzer.json |
Source code extraction |
📐 layout_analyzer.json |
Page layout structure analysis |
🧩 elements_analyzer.json |
Document element identification |
🏷️ metadata_*_analyzer.json |
MODS/MADS/Generic metadata extraction |
🗺️ war_map_analyzer.json |
Historical military map analysis |
🔬 scientific_analyzer.json |
Scientific notation and formulas |
🏷️ classify_pdf_content.json |
Page content classification |
🖼️ pdf_processor.json |
PDF-to-image conversion orchestration |
🔗 correlation_analyzer.json |
Multi-analyzer result correlation per page |
Each manifest contains:
tool: MCP tool definition (name, description, inputSchema includingaudit_modeboolean)analyzer: Model selection, prompt files, examples configuration,expected_output_tokens(estimated tokens for cost calculation)metadata: Version, dependencies, andanalyzer_type
Analyzer-specific prompt files organized by analyzer type. Each analyzer has a dedicated subdirectory containing XML prompt components.
Most analyzers follow this file naming convention:
| File Suffix | Purpose |
|---|---|
*_job_role.xml |
👤 Defines the AI's persona and expertise |
*_context.xml |
🎯 Background information and task importance |
*_rules.xml |
⚖️ Specific extraction/analysis rules |
*_tasks_extraction.xml |
✅ Step-by-step task instructions |
*_format.xml |
📋 Expected XML response structure |
*_help.xml |
💡 Guidance for edge cases and complex scenarios |
*_dictionary.xml |
📖 Element type definitions and terminology |
Some analyzers include few-shot-images/ or few-shot-examples/ directories containing example images for in-context learning:
- 📈
charts_analyzer/ - 🏥
decision_tree_analyzer/ - 🔀
diagram_analyzer/ - ✍️
handwriting_analyzer/ - 📊
table_analyzer/
JSON Schema definitions for tool input/output validation. Each schema file corresponds to an analyzer tool and defines:
- ✅ Required and optional input parameters
- 🏷️ Parameter types and descriptions
- 📤 Output structure expectations
Used by the MCP server for request validation and by clients for understanding tool interfaces.
| File | Purpose |
|---|---|
prompt_system_wrapper.xml |
Alternative/backup wrapper template (mirrors core_system_prompts version) |
Files are synced to S3 via:
./sync_s3_files.shLambda functions load these files at runtime using the S3ConfigLoader from the foundation layer, enabling prompt updates without redeploying Lambda code.