Convert a folder of academic PDFs to plain text, then use Claude to verify every factual claim in a target paper against its cited references.
pip install -r requirements.txt
export ANTHROPIC_API_KEY=your_key_heremy_papers/
├── main_paper.pdf ← paper whose claims you want to verify
├── reference_a.pdf
├── reference_b.pdf
└── ...
Step 1 — Convert PDFs to text
python pdf_to_text.py my_papers/
# output goes to my_papers/plaintext/Step 2 — Fact-check
python factcheck.py my_papers/plaintext/main_paper.txt my_papers/plaintext/ report.mdClaude reads the main paper, locates each cited source in the reference folder, and streams a structured verdict for every claim. The report is saved to report.md.
| File | Purpose |
|---|---|
pdf_to_text.py |
Batch-convert a folder of PDFs to .txt |
factcheck.py |
Verify claims in a paper against its references using Claude |
fact_check.ipynb |
Full pipeline notebook: convert PDFs and fact-check with Claude |
Each claim gets a verdict block:
**[Author(s) (Year)] — claim summary**
- Verdict: ACCURATE / SLIGHTLY INACCURATE / INACCURATE / CANNOT VERIFY
- Details: what the source actually says (with quotes)
Followed by a Missing Papers section for any reference cited in the main paper but not found in the folder.
If you have Claude Pro or Max you can use Claude Code (the terminal CLI) instead of the API scripts. Claude Code is free with those subscriptions and will read CLAUDE.md automatically when you open the folder, so it already knows what the project does.
# 1. Convert your PDFs first
python pdf_to_text.py my_papers/
# 2. Open the project folder in Claude Code
cd my_papers
claudeThen just ask it:
"Fact-check main_paper.txt against all the other files in this folder"
Claude Code will read the papers and produce a report directly in the terminal. You can also ask follow-up questions, request a specific claim be re-examined, or ask it to save the report to a file.
- The reference corpus is prompt-cached — repeated runs against the same folder are significantly cheaper
- Adaptive thinking is enabled so Claude reasons carefully before giving verdicts
- If total input exceeds ~180K tokens, the script will warn you; split into batches if needed
- Works best on digitally-created PDFs; scanned image-only PDFs will produce poor text