What's New
DocumentCloud Integration
- Search 10M+ public documents via
openfoia records search --source documentcloud
- Fetch full text into local database β DocumentCloud already extracted it, no OCR needed
- Cross-reference entities against DocumentCloud alongside MuckRock, SEC, OpenCorporates, OpenSanctions, and ICIJ
records fetch command β pull any document's text locally for entity extraction
- Uses httpx REST API directly for speed (not the python-documentcloud library)
- Search highlights show where your terms appear in the document
- Idempotent fetch β re-running won't duplicate records
- API errors surfaced clearly (not masked as "no results")
Interactive Document Reader
- Click any entity in the graph to see its source documents
- Open the document reader β full text with every entity highlighted inline
- Sidebar lists all entities found, sorted by frequency, color-coded by type
- Click an entity in the sidebar or text to jump to its occurrences
- "View original source" button links directly to DocumentCloud
- Source URLs on document cards for immediate verification
- Documents without text gracefully shown as unavailable
Graph Visualization Improvements
- Double-click to zoom smoothly into any node
- Selection highlighting β dims unconnected nodes, brightens direct connections
- Curved edges to reduce visual overlap
- Connected entities shown as clickable tags in the info panel
- Adaptive spacing β repulsion scales with node count
- Pointer cursor on hoverable nodes
- Escape key to close reader/deselect
Other Improvements
- Multi-layer MuckRock search (tags β agency ID β user filter)
- MSG email file support for ingest
- File type display in search results
- 13 security fixes from codex adversarial review
- Entity links FK constraint fix
- Graph HTML template extracted to
graph_template.py (640 lines out of cli.py)
- New README screenshots showing the graph and document reader
- ruff format passing on all files
Data Sources (6)
| Source |
Documents |
| MuckRock |
46k+ FOIA requests |
| DocumentCloud |
10M+ public documents |
| OpenCorporates |
Global company records |
| SEC EDGAR |
US corporate filings |
| OpenSanctions |
Sanctions & PEP lists |
| ICIJ Offshore Leaks |
Panama/Pandora Papers |
Install
curl -fsSL https://raw.githubusercontent.com/JordanCoin/openfoia/main/install.sh | bash