papertrail

Convert a folder of academic PDFs to plain text, then use Claude to verify every factual claim in a target paper against its cited references.

Setup

pip install -r requirements.txt
export ANTHROPIC_API_KEY=your_key_here

Workflow

my_papers/
├── main_paper.pdf       ← paper whose claims you want to verify
├── reference_a.pdf
├── reference_b.pdf
└── ...

Step 1 — Convert PDFs to text

python pdf_to_text.py my_papers/
# output goes to my_papers/plaintext/

Step 2 — Fact-check

python factcheck.py my_papers/plaintext/main_paper.txt my_papers/plaintext/ report.md

Claude reads the main paper, locates each cited source in the reference folder, and streams a structured verdict for every claim. The report is saved to report.md.

Scripts

File	Purpose
`pdf_to_text.py`	Batch-convert a folder of PDFs to `.txt`
`factcheck.py`	Verify claims in a paper against its references using Claude
`fact_check.ipynb`	Full pipeline notebook: convert PDFs and fact-check with Claude

Output format

Each claim gets a verdict block:

**[Author(s) (Year)] — claim summary**
- Verdict: ACCURATE / SLIGHTLY INACCURATE / INACCURATE / CANNOT VERIFY
- Details: what the source actually says (with quotes)

Followed by a Missing Papers section for any reference cited in the main paper but not found in the folder.

Alternative: Claude Code (no API key required)

If you have Claude Pro or Max you can use Claude Code (the terminal CLI) instead of the API scripts. Claude Code is free with those subscriptions and will read CLAUDE.md automatically when you open the folder, so it already knows what the project does.

# 1. Convert your PDFs first
python pdf_to_text.py my_papers/

# 2. Open the project folder in Claude Code
cd my_papers
claude

Then just ask it:

"Fact-check main_paper.txt against all the other files in this folder"

Claude Code will read the papers and produce a report directly in the terminal. You can also ask follow-up questions, request a specific claim be re-examined, or ask it to save the report to a file.

Notes

The reference corpus is prompt-cached — repeated runs against the same folder are significantly cheaper
Adaptive thinking is enabled so Claude reasons carefully before giving verdicts
If total input exceeds ~180K tokens, the script will warn you; split into batches if needed
Works best on digitally-created PDFs; scanned image-only PDFs will produce poor text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

papertrail

Setup

Workflow

Scripts

Output format

Alternative: Claude Code (no API key required)

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CLAUDE.md		CLAUDE.md
README.md		README.md
fact_check.ipynb		fact_check.ipynb
factcheck.py		factcheck.py
pdf_to_text.py		pdf_to_text.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

papertrail

Setup

Workflow

Scripts

Output format

Alternative: Claude Code (no API key required)

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages