MOSAICX is controlled entirely from the command line. Every command supports --help for quick reference.
This reference covers all commands, flags, and options. Each command includes practical examples assuming you're starting fresh with no prior MOSAICX experience.
Available on all commands:
| Flag | Description |
|---|---|
--version |
Show version and exit |
--help |
Show help message and exit |
Examples:
# Check your version
mosaicx --version
# Show main help
mosaicx --help
# Get help on a specific command
mosaicx extract --helpExtract structured data from a clinical document.
MOSAICX can extract data in three ways:
- Auto mode (no flags): LLM automatically determines what to extract
- Template mode (
--template): Use a built-in template, user template, YAML file, or legacy saved schema - Mode mode (
--mode): Use a built-in multi-step pipeline (radiology, pathology)
The --template flag resolves its argument through a resolution chain:
- YAML file path (if suffix is
.yaml/.ymland file exists) - User template in
~/.mosaicx/templates/ - Built-in template name (e.g.
chest_ct,brain_mri) - Legacy saved schema from
~/.mosaicx/schemas/ - Error if nothing matches
Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--document |
PATH | Yes | Path to the document (PDF, TXT, DOCX, PNG, JPG, TIFF) |
--template |
TEXT | No | Template name, YAML file path, or saved schema name |
--mode |
TEXT | No | Extraction mode (e.g., radiology, pathology) |
--score |
flag | No | Score completeness of extracted data against the template |
--optimized |
PATH | No | Path to an optimized DSPy program (.json file) |
-o, --output |
PATH | No | Save output to JSON or YAML file |
--list-modes |
flag | No | List available extraction modes and exit |
--dir |
PATH | No | Directory of documents for batch processing |
--workers |
INT | No | Number of parallel workers (default: 1) |
--output-dir |
PATH | No | Directory for output files (batch mode) |
--format |
TEXT | No | Export format(s): jsonl, parquet (can repeat) |
--resume |
flag | No | Resume from last checkpoint |
Important:
--templateand--modeare mutually exclusive -- use only one--documentand--dirare mutually exclusive -- use only one- If neither
--templatenor--modeis provided, auto mode is used - Supported formats: PDF, TXT, DOCX, MD, PNG, JPG, JPEG, TIF, TIFF
Examples:
# Auto mode -- LLM decides what to extract from the document
mosaicx extract --document report.pdf
# List available modes
mosaicx extract --list-modes
# Radiology mode -- 5-step pipeline for radiology reports
# Steps: classify exam -> parse sections -> extract technique -> findings -> impression
mosaicx extract --document ct_chest.pdf --mode radiology
# Pathology mode -- 5-step pipeline for pathology reports
# Steps: classify specimen -> parse sections -> specimen details -> findings -> diagnosis
mosaicx extract --document biopsy.pdf --mode pathology
# Use a built-in template by name
mosaicx extract --document ct_chest.pdf --template chest_ct
# Use a user-created YAML template file
mosaicx extract --document report.pdf --template echo.yaml
# Use a legacy saved schema (resolved from ~/.mosaicx/schemas/)
mosaicx extract --document echo.pdf --template EchoReport
# Extract with completeness scoring
mosaicx extract --document ct_chest.pdf --template chest_ct --score
# Save output to JSON
mosaicx extract --document report.pdf --mode radiology -o output.json
# Save output to YAML
mosaicx extract --document report.pdf --mode radiology -o output.yaml
# Use an optimized program (from mosaicx optimize)
mosaicx extract --document report.pdf --template chest_ct \
--optimized ~/.mosaicx/optimized/radiology_optimized.json
# Combine mode with custom save location
mosaicx extract --document ct_report.pdf --mode radiology \
-o /path/to/results/structured_report.json
# Batch process a directory
mosaicx extract --dir ./reports --output-dir ./structured --mode radiology
# Batch with 4 parallel workers
mosaicx extract --dir ./reports --output-dir ./structured --workers 4
# Batch with export formats
mosaicx extract --dir ./reports --output-dir ./structured --format jsonl --format parquet
# Resume a failed batch
mosaicx extract --dir ./reports --output-dir ./structured --resumeWhat you'll see:
Without --output, results are displayed in the terminal as formatted tables. Use --output to save the full structured data as JSON or YAML.
When --score is used, a completeness report is shown after the extracted data, scoring how thoroughly the template fields were populated.
Create a new YAML template from a description, sample document, CSV/Excel table, web page, RadReport ID, or JSON schema.
Templates are saved to ~/.mosaicx/templates/ by default and can be reused with mosaicx extract --template.
Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--describe |
TEXT | No* | Natural-language description of the template |
--from-document |
PATH | No* | Infer template from a sample document |
--from-url |
TEXT | No* | Infer template from a web page (e.g. RadReport URL) |
--from-radreport |
TEXT | No* | RadReport template ID (e.g. RPT50890 or 50890) |
--from-json |
PATH | No* | Convert a saved SchemaSpec JSON to YAML template |
--from-pydantic |
PATH | No* | Convert Pydantic model definitions to YAML template(s) |
--from-table |
PATH | No* | Convert a CSV/TSV/Excel field table or data table to YAML template |
--split-by |
TEXT | No | For --from-table, create one template per value in this column |
--name-column |
TEXT | No | For --from-table, column containing field names |
--type-column |
TEXT | No | For --from-table, column containing field types |
--description-column |
TEXT | No | For --from-table, column containing field descriptions |
--required-column |
TEXT | No | For --from-table, column containing required/mandatory flags |
--values-column |
TEXT | No | For --from-table, column containing enum values or row-wise catalog values |
--value-label-column |
TEXT | No | For --from-table, column describing enum/catalog values |
--catalog-id-column |
TEXT | No | For --from-table, column containing catalog IDs/names |
--catalog-version-column |
TEXT | No | For --from-table, column containing catalog versions |
--name |
TEXT | No | Override the template name (default: LLM-chosen) |
--mode |
TEXT | No | Pipeline mode to embed (e.g. radiology, pathology) |
--output |
PATH | No | Custom save path (default: ~/.mosaicx/templates/) |
--output-dir |
PATH | No | Directory for --from-table --split-by output |
Important:
- Must provide at least one source:
--describe,--from-document,--from-url,--from-radreport,--from-json,--from-pydantic, or--from-table --from-jsoncannot be combined with other sources--from-tablecan only be combined with--describe--outputcannot be used with--split-by; use--output-dir--describeand--from-documentcan be combined for better results- Templates are saved as YAML files in
~/.mosaicx/templates/{name}.yaml
Examples:
# Generate from description
mosaicx template create \
--describe "echo report with LVEF, valve grades, chamber dimensions, and impression"
# Generate from sample document
mosaicx template create --from-document sample_echo.pdf
# Convert a CSV/Excel field table without an LLM
mosaicx template create --from-table fields.csv --name OncologyFields
# Split a data dictionary into one template per form
mosaicx template create \
--from-table onkostar_catalog.csv \
--split-by form_name \
--output-dir ./templates/onkostar
# Map custom data-dictionary headers
mosaicx template create \
--from-table data_dictionary.xlsx \
--name StudyCRF \
--name-column variable \
--type-column kind \
--description-column label \
--required-column mandatory \
--values-column allowed_values
# Combine description and document
mosaicx template create \
--describe "extract vital signs and lab values" \
--from-document clinic_note.pdf
# Generate from a web page
mosaicx template create --from-url https://radreport.org/template/0050890
# Generate from a RadReport template ID
mosaicx template create --from-radreport RPT50890
# Convert a legacy JSON schema to YAML template
mosaicx template create --from-json ~/.mosaicx/schemas/EchoReport.json
# Override the auto-generated name
mosaicx template create \
--describe "CT lung nodule report with LUNG-RADS score" \
--name CTLungNodule
# Embed a pipeline mode in the template
mosaicx template create \
--describe "chest CT report" --mode radiology
# Save to custom location
mosaicx template create \
--describe "chest x-ray findings" \
--output /path/to/my_templates/chest_xr.yamlWhat happens:
- LLM analyzes your description, document, or web content
- Generates a YAML template with sections, types, and descriptions
- Saves the template to
~/.mosaicx/templates/{name}.yaml - Displays a preview of the generated YAML
You can now use the template with:
mosaicx extract --document new_echo.pdf --template EchoReportList available built-in and user-created templates.
Built-in templates are pre-defined YAML schemas for common radiology exams. User templates are stored in ~/.mosaicx/templates/.
Examples:
mosaicx template listOutput:
Shows two tables:
-
Built-in Templates -- with columns:
- Template name
- Mode (e.g., radiology)
- RDES (RadReport ID, if applicable)
- Description
-
User Templates (if any exist) -- with columns:
- Template name
- Description
Display details of a template (built-in, user-created, or legacy saved schema).
Usage:
mosaicx template show <name>Examples:
# Show a built-in template
mosaicx template show chest_ct
# Show a user-created template
mosaicx template show EchoReport
# Show a legacy saved schema
mosaicx template show CTLungNoduleOutput:
Displays:
- Template name and source (built-in or user)
- Description
- Mode and RDES ID (if applicable)
- Table of sections/fields with name, type, required status, and description
Render the DSPy/BAML prompt preview for a template without calling an LLM.
Use this to inspect what schema, field descriptions, enum values, and catalog labels the model will see during mosaicx extract.
Usage:
mosaicx template prompt <name>Examples:
# Show the BAML-rendered schema for a template
mosaicx template prompt OSDiagnose
# Include a truncated document preview in the rendered user message
mosaicx template prompt OSDiagnose --document report.pdfThis command imports DSPy/BAML locally but does not configure a model and does not require an LLM server or API key.
Refine an existing template using LLM-powered natural-language instructions.
The current version is archived before saving the refined version, so you can revert if needed.
Usage:
mosaicx template refine <name> --instruction "..."Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--instruction |
TEXT | Yes | Natural-language refinement instruction |
--output |
PATH | No | Save refined template to a different path |
Important:
- Works with both built-in and user templates
- Refining a built-in template saves the result as a user template
- Previous versions are archived in
~/.mosaicx/templates/.history/
Examples:
# Add a field using natural language
mosaicx template refine EchoReport \
--instruction "add a field for tricuspid valve regurgitation severity"
# Remove fields
mosaicx template refine EchoReport \
--instruction "remove wall_motion and add regional_wall_motion_abnormalities as a list"
# Make structural changes
mosaicx template refine CTReport \
--instruction "add a LUNG-RADS category field as an integer 1-4"
# Save refined template to a custom location
mosaicx template refine chest_ct \
--instruction "add fields for coronary calcification" \
--output /path/to/custom_chest_ct.yamlConvert legacy JSON schemas from ~/.mosaicx/schemas/ to YAML templates in ~/.mosaicx/templates/.
This is a one-time migration command for users upgrading from the old schema system to the unified template system.
Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--dry-run |
flag | No | Show what would be migrated without writing files |
Examples:
# Preview what would be migrated
mosaicx template migrate --dry-run
# Perform the migration
mosaicx template migrateWhat happens:
- Scans
~/.mosaicx/schemas/for JSON schema files - Converts each to YAML template format
- Saves to
~/.mosaicx/templates/{name}.yaml - Skips any templates that already exist as YAML
- Reports migrated, skipped, and errored files
Show version history of a user template.
Every time you refine a template, the previous version is archived. This command lists all archived versions.
Usage:
mosaicx template history <name>Examples:
mosaicx template history EchoReport
mosaicx template history CTLungNoduleOutput:
Table showing:
- Version number (v1, v2, v3, ...)
- Date modified
- Current version
Important:
- Only user templates have version history (not built-in templates)
- History is stored in
~/.mosaicx/templates/.history/
Compare the current version of a user template against a previous archived version.
Usage:
mosaicx template diff <name> --version <N>Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--version |
INT | Yes | Version number to compare against current |
Examples:
# Compare current EchoReport to version 2
mosaicx template diff EchoReport --version 2
# See what changed since version 1
mosaicx template diff CTReport --version 1Output:
Shows:
- Added sections (green
+) - Removed sections (red
-) - Modified sections (yellow
~) with details of what changed
Restore a user template to a previous version.
The current version is archived before reverting.
Usage:
mosaicx template revert <name> --version <N>Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--version |
INT | Yes | Version number to revert to |
Examples:
# Revert EchoReport to version 2
mosaicx template revert EchoReport --version 2
# Undo recent changes by reverting to version 1
mosaicx template revert CTReport --version 1What happens:
- Current template is archived as the next version number
- Specified version becomes the current template
- Confirmation message shows old and new version numbers
Validate a custom YAML template file.
Use this to check if your custom template is correctly formatted before using it with mosaicx extract --template.
Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--file |
PATH | Yes | Path to YAML template file to validate |
Examples:
# Validate a custom template
mosaicx template validate --file my_template.yaml
# Validate before using in extraction
mosaicx template validate --file chest_ct.yamlOutput:
If valid:
- Success message
- Model name
- List of fields
If invalid:
- Error message with details
Synthesize a patient timeline from multiple clinical reports.
Generates a narrative summary and extracts key events from one or more documents.
Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--document |
PATH | No* | Single document to summarize |
--dir |
PATH | No* | Directory of reports for one patient |
--patient |
TEXT | No | Patient identifier |
-o, --output |
PATH | No | Save output to JSON or YAML file |
Important:
- Must provide
--documentor--dir - If using
--dir, all TXT, MD, and MARKDOWN files will be loaded
Examples:
# Summarize a single document
mosaicx summarize --document clinic_note.pdf
# Summarize all reports in a directory
mosaicx summarize --dir ./patient_123_reports --patient "Patient 123"
# Single report with patient ID
mosaicx summarize --document discharge_summary.pdf --patient "John Doe"Output:
Displays:
- Narrative summary (prose description of patient timeline)
- Timeline events table with columns: Date, Exam, Key Finding, Change from Prior
Remove Protected Health Information (PHI) from clinical documents.
Supports three de-identification strategies:
- remove (default): Replace PHI with
[REDACTED] - pseudonymize: Replace PHI with fake but consistent values
- dateshift: Shift dates by a random offset while preserving intervals
Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--document |
PATH | No* | Single document to de-identify |
--dir |
PATH | No* | Directory of documents to de-identify |
--mode |
CHOICE | No | De-identification strategy: remove, pseudonymize, dateshift (default: remove) |
--regex-only |
flag | No | Use regex-only PHI scrubbing (no LLM call, faster) |
-o, --output |
PATH | No | Save output to JSON or YAML file (single document) |
--output-dir |
PATH | No | Directory for output files (batch mode) |
--format |
TEXT | No | Export format(s): jsonl, parquet, csv (can repeat) |
--workers |
INT | No | Number of parallel workers (default: 1) |
--resume |
flag | No | Resume from last checkpoint |
Important:
- Must provide
--documentor--dir - Regex-only mode is faster but less accurate (only pattern matching)
- Full LLM mode is more thorough but slower and requires API calls
Examples:
# De-identify a single document (default: remove PHI)
mosaicx deidentify --document clinic_note.txt
# De-identify with pseudonymization
mosaicx deidentify --document report.txt --mode pseudonymize
# De-identify with date shifting
mosaicx deidentify --document discharge.txt --mode dateshift
# Batch de-identify a directory
mosaicx deidentify --dir ./reports --mode remove
# Fast regex-only mode (no LLM)
mosaicx deidentify --document report.txt --regex-only
# Parallel de-identification with 4 workers
mosaicx deidentify --dir ./patient_reports --workers 4 --mode pseudonymize
# Save single-document output to file
mosaicx deidentify --document clinic_note.txt -o deidentified.json
# Batch with output directory and export formats
mosaicx deidentify --dir ./reports --output-dir ./deidentified \
--format jsonl --format csv
# Resume a failed batch
mosaicx deidentify --dir ./reports --output-dir ./deidentified --resumeWhat gets redacted:
- Patient names
- Medical record numbers (MRNs)
- Dates (birth dates, admission dates, etc.)
- Addresses
- Phone numbers
- Email addresses
- Other identifiers
Output:
Displays the de-identified text in a formatted panel. If processing a directory, shows output for each file.
Verify an extraction or claim against a source document.
Checks whether structured extractions or free-text claims are supported by the original source document. Uses deterministic text analysis for the "quick" level (no LLM needed).
Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--document |
PATH | No | Single source document (legacy single-source option) |
--sources |
PATH (repeatable) | No | One or more source documents to verify against |
--claim |
TEXT | No | A free-text claim to verify against the document |
--extraction |
PATH | No | JSON file with extraction output to verify |
--level |
CHOICE | No | Verification depth: quick (default), standard, thorough |
-o, --output |
PATH | No | Save verification result to JSON or YAML file |
Important:
- At least one of
--claimor--extractionmust be provided - At least one of
--documentor--sourcesmust be provided quicklevel uses deterministic checks (regex, text matching) -- no LLM needed, very faststandardlevel adds LLM spot-check of high-risk fields (measurements, severity, staging)thoroughlevel runs a full LLM audit of all extracted fields- Supported document formats: PDF, TXT, DOCX, MD, PNG, JPG, JPEG, TIF, TIFF
Verdicts:
| Verdict | Meaning |
|---|---|
verified |
All claims/fields are supported by the source text |
partially_supported |
Some fields supported, some could not be confirmed |
contradicted |
Source text contradicts the claim or extraction |
insufficient_evidence |
Source text does not contain enough information to judge |
Examples:
# Verify a free-text claim against a document
mosaicx verify --document ct_report.pdf --claim "2.3cm nodule in right upper lobe"
# Verify extraction output against the source document
mosaicx verify --document ct_report.pdf --extraction output.json
# Verify with thorough checking
mosaicx verify --document ct_report.pdf --extraction output.json --level thorough
# Save verification result to file
mosaicx verify --document ct_report.pdf --claim "normal chest CT" -o result.jsonOutput:
Displays:
- A decision-first adjudication block (
Decision,Requested,Effective, fallback info) - Claim mode includes
Claim truth(true,false, orinconclusive) for immediate developer gating - Claim mode:
Claim ComparisonwithClaimed,Source, andEvidence - Extraction mode: optional field-level mismatch table
- Machine-readable JSON/YAML includes
decision,support_score,verification_mode, and fallback metadata
Query documents and data sources with natural language.
Load one or more data files and ask a question. Uses RLM (Recursive Language Model) -- the model writes and executes Python code in a sandboxed environment to answer your question.
Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--document |
PATH (repeatable) | No | Path to a data source (legacy alias) |
--sources |
TEXT (repeatable) | No | Paths, directories, or glob patterns (e.g. "reports/*.txt") |
-q, --question |
TEXT | No | Ask one question and print answer with evidence |
--chat |
flag | No | Start a multi-turn query chat session |
--citations |
INT | No | Maximum citations returned per turn (default: 3) |
--max-iterations |
INT | No | RLM iteration budget per answer (default: 8, lower is faster) |
-o, --output |
PATH | No | Save query turns/citations to JSON or YAML file |
Important:
- Requires Deno installed for the RLM code sandbox
- Requires a model with strong structured output capability (120B+ recommended)
- At least one
--documentor--sourcesinput is required -qruns one-shot query;--chatruns multi-turn session with conversation memory- Each answer includes evidence citations and grounding confidence
- If RLM is unavailable, query falls back to retrieval-only evidence mode
Examples:
# Ask a question about a CSV file
mosaicx query --document patient_data.csv -q "What is the mean age?"
# Query across multiple documents
mosaicx query --document data.csv --document notes.pdf -q "Summarize the key findings"
# Use glob-style source patterns
mosaicx query --sources "reports/*.txt" -q "List all pulmonary nodules with sizes"
# Multi-turn chat mode
mosaicx query --document report.pdf --chat
# Save the answer to a file
mosaicx query --document report.pdf -q "List all medications mentioned" -o answer.json
# Just load and inspect sources (no question)
mosaicx query --document data.csv --document results.jsonOutput:
Displays:
- Source catalog table (name, format, type, size)
- One-shot mode: answer + evidence citations + grounding confidence
- Chat mode: multi-turn answers with citations per turn
Optimize a DSPy pipeline using labeled examples.
Optimization uses progressive strategies (BootstrapFewShot -> MIPROv2 -> GEPA) to improve pipeline performance on your specific data.
Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--pipeline |
TEXT | No | Pipeline to optimize (e.g., radiology, pathology, extract) |
--trainset |
PATH | No | Training dataset in JSONL format |
--valset |
PATH | No | Validation dataset in JSONL format |
--budget |
CHOICE | No | Optimization budget: light, medium, heavy (default: medium) |
--save |
PATH | No | Custom save path for optimized program |
--list-pipelines |
flag | No | List available pipelines and exit |
Budget presets:
| Budget | Strategy | Cost | Time | Min Examples |
|---|---|---|---|---|
light |
BootstrapFewShot | ~$0.50 | ~5 min | 10 |
medium |
MIPROv2 | ~$3 | ~20 min | 10 |
heavy |
GEPA | ~$10 | ~45 min | 10 |
Important:
- Requires labeled training data in JSONL format
- Optimized programs are saved to
~/.mosaicx/optimized/by default - Use optimized programs with
mosaicx extract --optimizedormosaicx eval --optimized
Examples:
# List available pipelines
mosaicx optimize --list-pipelines
# Light optimization (BootstrapFewShot)
mosaicx optimize --pipeline radiology \
--trainset train.jsonl --budget light
# Medium optimization (MIPROv2, recommended)
mosaicx optimize --pipeline radiology \
--trainset train.jsonl --valset val.jsonl --budget medium
# Heavy optimization (GEPA, best results)
mosaicx optimize --pipeline pathology \
--trainset train.jsonl --valset val.jsonl --budget heavy
# Custom save location
mosaicx optimize --pipeline extract \
--trainset examples.jsonl --budget medium \
--save /path/to/optimized/custom_extractor.json
# Optimize the schema generator
mosaicx optimize --pipeline schema \
--trainset schema_examples.jsonl --budget lightAvailable pipelines:
radiology-- RadiologyReportStructurerpathology-- PathologyReportStructurerextract-- DocumentExtractorsummarize-- ReportSummarizerdeidentify-- Deidentifierschema-- SchemaGenerator
Training data format (JSONL):
Each line is a JSON object with inputs and expected outputs. Example for radiology:
{"report_text": "CT CHEST WITH CONTRAST...", "report_header": "CT CHEST", "expected": {...}}
{"report_text": "MRI BRAIN WITHOUT CONTRAST...", "report_header": "MRI BRAIN", "expected": {...}}Output:
Displays:
- Optimization configuration
- Progressive strategy stages
- Training and validation scores
- Save path for optimized program
Evaluate a pipeline against a labeled test set.
Runs the pipeline on each example in the test set and computes metrics.
Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--pipeline |
TEXT | Yes | Pipeline to evaluate (e.g., radiology, pathology) |
--testset |
PATH | Yes | Test dataset in JSONL format |
--optimized |
PATH | No | Path to optimized program (if not provided, uses baseline) |
--output |
PATH | No | Save detailed results as JSON |
Examples:
# Evaluate baseline (unoptimized) radiology pipeline
mosaicx eval --pipeline radiology --testset test.jsonl
# Evaluate optimized radiology pipeline
mosaicx eval --pipeline radiology --testset test.jsonl \
--optimized ~/.mosaicx/optimized/radiology_optimized.json
# Save detailed results
mosaicx eval --pipeline pathology --testset test.jsonl \
--optimized pathology_opt.json --output eval_results.json
# Evaluate the document extractor
mosaicx eval --pipeline extract --testset extract_test.jsonlOutput:
Displays:
- Evaluation configuration (pipeline, test set, examples count)
- Statistics table:
- Count
- Mean score
- Median score
- Standard deviation
- Min/Max scores
- Score distribution histogram (0.0-0.2, 0.2-0.4, etc.)
- Detailed results (if
--outputspecified)
Test data format:
Same JSONL format as training data. See mosaicx optimize for details.
Scaffold a new extraction pipeline from a built-in template.
Generates a complete DSPy pipeline module with lazy loading, mode registration, and a single-step extraction chain. The generated file follows the same pattern as the built-in radiology and pathology pipelines.
Usage:
mosaicx pipeline new <name> [--description "..."]Options:
| Flag | Type | Required | Description |
|---|---|---|---|
name |
TEXT | Yes | Pipeline name (auto-normalized to snake_case) |
-d, --description |
TEXT | No | One-line description of the pipeline |
Examples:
# Scaffold a cardiology pipeline
mosaicx pipeline new cardiology --description "Cardiology report structurer"
# PascalCase and kebab-case are normalized automatically
mosaicx pipeline new echo-report -d "Echocardiography report extraction"
# Minimal -- auto-generates a description
mosaicx pipeline new dermatologyWhat gets generated:
A new file at mosaicx/pipelines/<name>.py containing:
- Mode registration (so
--mode <name>works withmosaicx extract) - A DSPy Signature class for input/output fields
- A DSPy Module class with a
forward()method - Lazy loading boilerplate (module imports DSPy only when needed)
After scaffolding:
The command prints a wiring checklist of manual steps to complete the pipeline registration (adding to mode modules, evaluation registries, and CLI imports).
Start the MOSAICX Model Context Protocol (MCP) server.
The MCP server exposes MOSAICX tools (extract, verify, query, deidentify, schema generate, list schemas, list modes) for AI agents like Claude Code, Claude Desktop, and other MCP-compatible clients.
Options:
| Flag | Type | Required | Description |
|---|---|---|---|
--transport |
CHOICE | No | Transport protocol: stdio or sse (default: stdio) |
--port |
INT | No | Port for the SSE HTTP server (default: 8080) |
Examples:
# Start with stdio transport (default -- for Claude Code / Claude Desktop)
mosaicx mcp serve
# Start with SSE transport on port 9000
mosaicx mcp serve --transport sse --port 9000Important:
- Requires the
mcpoptional dependency:pip install mosaicx[mcp] - Use
stdiotransport for local integrations (Claude Code, Claude Desktop) - Use
ssetransport for remote/network integrations
See the MCP Server guide for setup instructions with Claude Code and Claude Desktop.
Print current configuration values.
Displays all MOSAICX settings, including:
- Language models (LM)
- Processing settings
- OCR settings
- Export settings
- Paths
Examples:
mosaicx config showOutput sections:
-
Language Models
lm-- Main language modellm_cheap-- Cheaper model for simple tasksapi_base-- API base URLapi_key-- Masked API key
-
Processing
default_template-- Default template namecompleteness_threshold-- Minimum completeness score (0-1)batch_workers-- Default parallel workerscheckpoint_every-- Checkpoint frequency
-
Document OCR
ocr_engine-- OCR engine (both,surya,chandra)chandra_backend-- Chandra backend (vllm,hf,auto)chandra_server_url-- Chandra server URL (if applicable)quality_threshold-- Minimum OCR quality (0-1)ocr_page_timeout-- Timeout per page (seconds)force_ocr-- Always use OCR (even for text PDFs)ocr_langs-- OCR languages
-
Export & Privacy
export_formats-- Default export formatsdeidentify_mode-- Default de-identification mode
-
Paths
home_dir-- MOSAICX home directory (~/.mosaicx)schema_dir-- Schema directoryoptimized_dir-- Optimized programs directorycheckpoint_dir-- Checkpoint directorylog_dir-- Log directory
Set a configuration value (runtime only).
Usage:
mosaicx config set <key> <value>Important:
- Changes are not persisted across sessions
- For permanent changes, use environment variables (
MOSAICX_*) or a.envfile
Examples:
# Set the main language model (runtime only)
mosaicx config set lm "openai/gpt-4"
# Set API base (runtime only)
mosaicx config set api_base "http://localhost:8000/v1"Recommended approach for persistent config:
Create a .env file in your project directory or set environment variables:
# .env file
MOSAICX_LM=openai/gpt-4
MOSAICX_API_KEY=your-api-key-here
MOSAICX_API_BASE=http://localhost:11434/v1
MOSAICX_OCR_ENGINE=both
MOSAICX_BATCH_WORKERS=4Or use environment variables:
export MOSAICX_LM="openai/gpt-4"
export MOSAICX_API_KEY="your-api-key-here"
export MOSAICX_API_BASE="http://localhost:11434/v1"All configuration options can be set via environment variables with the MOSAICX_ prefix.
| Variable | Type | Default | Description |
|---|---|---|---|
MOSAICX_LM |
string | openai/gpt-oss:120b |
Main language model |
MOSAICX_LM_CHEAP |
string | openai/gpt-oss:20b |
Cheaper model for simple tasks |
MOSAICX_API_KEY |
string | ollama |
API key |
MOSAICX_API_BASE |
string | http://localhost:11434/v1 |
API base URL |
MOSAICX_DEFAULT_TEMPLATE |
string | auto |
Default template name |
MOSAICX_COMPLETENESS_THRESHOLD |
float | 0.7 |
Minimum completeness score (0-1) |
MOSAICX_BATCH_WORKERS |
int | 1 |
Number of parallel workers |
MOSAICX_CHECKPOINT_EVERY |
int | 50 |
Checkpoint frequency |
MOSAICX_HOME_DIR |
path | ~/.mosaicx |
MOSAICX home directory |
MOSAICX_DEIDENTIFY_MODE |
choice | remove |
De-identification mode (remove, pseudonymize, dateshift) |
MOSAICX_DEFAULT_EXPORT_FORMATS |
list | ["parquet", "jsonl"] |
Default export formats |
MOSAICX_OCR_ENGINE |
choice | both |
OCR engine (both, surya, chandra) |
MOSAICX_CHANDRA_BACKEND |
choice | auto |
Chandra backend (vllm, hf, auto) |
MOSAICX_CHANDRA_SERVER_URL |
string | "" |
Chandra server URL |
MOSAICX_QUALITY_THRESHOLD |
float | 0.6 |
Minimum OCR quality (0-1) |
MOSAICX_OCR_PAGE_TIMEOUT |
int | 60 |
OCR timeout per page (seconds) |
MOSAICX_FORCE_OCR |
bool | false |
Always use OCR (even for text PDFs) |
MOSAICX_OCR_LANGS |
list | ["en", "de"] |
OCR languages (JSON array) |
Examples:
# Use GPT-4 via OpenAI API
export MOSAICX_LM="openai/gpt-4o"
export MOSAICX_API_KEY="sk-..."
export MOSAICX_API_BASE="https://api.openai.com/v1"
# Use a local vLLM server
export MOSAICX_LM="local/qwen-32b"
export MOSAICX_API_BASE="http://localhost:8000/v1"
export MOSAICX_API_KEY="none"
# Increase batch parallelism
export MOSAICX_BATCH_WORKERS=8
# Force OCR on all PDFs
export MOSAICX_FORCE_OCR=true
# Add Spanish to OCR languages
export MOSAICX_OCR_LANGS='["en", "de", "es"]'mosaicx extract --document ct_chest.pdf --mode radiology -o output.jsonmosaicx extract --document ct_chest.pdf --template chest_ct --score -o output.jsonmosaicx extract --dir ./biopsies --output-dir ./structured \
--mode pathology --workers 4 --format jsonl --format parquet# Generate template from description
mosaicx template create \
--describe "echo report with LVEF, valve grades, and wall motion"
# Use the template (auto-named by LLM, e.g., "EchoReport")
mosaicx extract --document echo.pdf --template EchoReport -o result.json# Preview what would be migrated
mosaicx template migrate --dry-run
# Perform the migration
mosaicx template migrate
# Use a migrated template
mosaicx extract --document echo.pdf --template EchoReport# Optimize
mosaicx optimize --pipeline radiology \
--trainset train.jsonl --valset val.jsonl --budget medium
# Evaluate optimized version
mosaicx eval --pipeline radiology --testset test.jsonl \
--optimized ~/.mosaicx/optimized/radiology_optimized.jsonmosaicx deidentify --dir ./clinic_notes --mode remove --workers 4mosaicx summarize --dir ./patient_123 --patient "Patient 123"# Extract structured data from a report
mosaicx extract --document ct_chest.pdf --template chest_ct -o output.json
# Verify the extraction against the source document
mosaicx verify --document ct_chest.pdf --extraction output.json# Extract structured data and save to JSON
mosaicx extract --document ct_chest.pdf --mode radiology -o structured.json
# Query the extracted data for specific findings
mosaicx query --document structured.json -q "Are there any critical findings?"
# Query across the source document and extraction together
mosaicx query --document ct_chest.pdf --document structured.json \
-q "Summarize the nodule measurements"MOSAICX stores data in ~/.mosaicx/ by default:
~/.mosaicx/
├── templates/ # User-created YAML templates
│ ├── EchoReport.yaml
│ ├── CTReport.yaml
│ └── .history/ # Archived template versions
│ ├── EchoReport_v1.yaml
│ └── EchoReport_v2.yaml
├── schemas/ # Legacy saved schemas (JSON)
│ ├── EchoReport.json
│ └── CTReport.json
├── optimized/ # Optimized DSPy programs
│ ├── radiology_optimized.json
│ └── pathology_optimized.json
├── checkpoints/ # Batch processing checkpoints
│ └── resume.json
└── logs/ # Log files (future)
You can override the home directory with:
export MOSAICX_HOME_DIR=/path/to/custom/dir-
Start with auto mode: Run
mosaicx extract --document report.pdfto see what MOSAICX can do without any configuration. -
Use built-in modes: For radiology and pathology reports, use
--mode radiologyor--mode pathologyfor best results. -
Try built-in templates: Run
mosaicx template listto see pre-defined templates for common exam types. -
Save your output: Always use
-o output.jsonto save the full structured data. Terminal output is summarized. -
Check available modes: Run
mosaicx extract --list-modesto see what's available. -
Create templates for repeated use: If you process the same report type often, create a template with
mosaicx template createand reuse it. -
Use batch mode for large datasets: Don't run
extract100 times manually -- usemosaicx extract --dirwith--workersfor parallelism. -
Optimize for your data: If you have labeled examples, use
mosaicx optimizeto improve accuracy on your specific reports. -
Resume failed batches: If a batch crashes, use
--resumeto pick up where you left off. -
Migrate legacy schemas: If you have JSON schemas from an older version, run
mosaicx template migrateto convert them to YAML templates. -
Check your config: Run
mosaicx config showto see what models and settings you're using. -
Use environment variables: Create a
.envfile withMOSAICX_*variables to avoid typing API keys and settings repeatedly.
Set your API key:
export MOSAICX_API_KEY="your-api-key-here"Or add to .env:
MOSAICX_API_KEY=your-api-key-here
- Check if the PDF is scanned (image-based) -- MOSAICX will use OCR automatically
- If OCR fails, try
--force-ocror adjustMOSAICX_OCR_ENGINE
- The document is low-resolution or poorly scanned
- Results may be unreliable -- check the extracted text
- Try adjusting
MOSAICX_QUALITY_THRESHOLD(lower = more permissive)
- Check available templates:
mosaicx template list - Verify the name matches exactly (case-sensitive)
- Ensure the template exists in
~/.mosaicx/templates/or as a built-in - For legacy schemas, the template resolution chain also checks
~/.mosaicx/schemas/
- Increase workers:
--workers 4or--workers 8 - Check if OCR is the bottleneck (try
--workers 1and monitor CPU/GPU) - For cloud LLMs, ensure your API has high rate limits
- You need at least 10 training examples
- See the "Min Examples" column in
mosaicx optimize --help
- Command help:
mosaicx <command> --help - List modes:
mosaicx extract --list-modes - List templates:
mosaicx template list - List pipelines:
mosaicx optimize --list-pipelines - Show config:
mosaicx config show - Check version:
mosaicx --version
End of CLI Reference