Skip to content

OpenFOIA v3.2.0 β€” DocumentCloud + Document Reader

Choose a tag to compare

@JordanCoin JordanCoin released this 24 Mar 23:25
· 16 commits to main since this release

What's New

DocumentCloud Integration

  • Search 10M+ public documents via openfoia records search --source documentcloud
  • Fetch full text into local database β€” DocumentCloud already extracted it, no OCR needed
  • Cross-reference entities against DocumentCloud alongside MuckRock, SEC, OpenCorporates, OpenSanctions, and ICIJ
  • records fetch command β€” pull any document's text locally for entity extraction
  • Uses httpx REST API directly for speed (not the python-documentcloud library)
  • Search highlights show where your terms appear in the document
  • Idempotent fetch β€” re-running won't duplicate records
  • API errors surfaced clearly (not masked as "no results")

Interactive Document Reader

  • Click any entity in the graph to see its source documents
  • Open the document reader β€” full text with every entity highlighted inline
  • Sidebar lists all entities found, sorted by frequency, color-coded by type
  • Click an entity in the sidebar or text to jump to its occurrences
  • "View original source" button links directly to DocumentCloud
  • Source URLs on document cards for immediate verification
  • Documents without text gracefully shown as unavailable

Graph Visualization Improvements

  • Double-click to zoom smoothly into any node
  • Selection highlighting β€” dims unconnected nodes, brightens direct connections
  • Curved edges to reduce visual overlap
  • Connected entities shown as clickable tags in the info panel
  • Adaptive spacing β€” repulsion scales with node count
  • Pointer cursor on hoverable nodes
  • Escape key to close reader/deselect

Other Improvements

  • Multi-layer MuckRock search (tags β†’ agency ID β†’ user filter)
  • MSG email file support for ingest
  • File type display in search results
  • 13 security fixes from codex adversarial review
  • Entity links FK constraint fix
  • Graph HTML template extracted to graph_template.py (640 lines out of cli.py)
  • New README screenshots showing the graph and document reader
  • ruff format passing on all files

Data Sources (6)

Source Documents
MuckRock 46k+ FOIA requests
DocumentCloud 10M+ public documents
OpenCorporates Global company records
SEC EDGAR US corporate filings
OpenSanctions Sanctions & PEP lists
ICIJ Offshore Leaks Panama/Pandora Papers

Install

curl -fsSL https://raw.githubusercontent.com/JordanCoin/openfoia/main/install.sh | bash