Drop a
.emlin, get a verdict out. Header forensics, URL analysis, suspicious-pattern heuristics, and an LLM verdict explainer — packaged as both a CLI and a FastAPI service.
Hand phish-forge a raw email (.eml) and it returns a structured verdict:
phish-forge analyze samples/phishing_sample.emlphish-forge verdict: PHISHING (score 78/100)
Headers
! SPF fail (sender 199.x.x.x not in spf for paypal.com)
! DKIM none (no signature)
! DMARC fail (alignment failed)
! From display 'PayPaI Security' (capital-I instead of capital-L)
URLs (3 found)
! http://paypa1.com-secure-login.tk/verify typosquat + .tk TLD
! https://bit.ly/3xyZ123 link shortener hides destination
- https://paypal.com/legal legitimate
Body heuristics
! urgency_language "Your account will be locked in 24 hours"
! credential_request "click here to verify your password"
! generic_greeting "Dear Customer,"
Verdict: PHISHING — high confidence
Real phishing detection in a SOC pipeline combines:
- Header authentication — does SPF/DKIM/DMARC actually align?
- URL reputation — is the destination a known phishing domain, a typosquat, a shortener, or a homoglyph?
- Content heuristics — urgency language, credential-request templates, mismatched display name vs. envelope sender.
- Optional LLM analysis — for things heuristics miss (tone, social-engineering pattern).
phish-forge is a learning exercise in chaining all four. Useful for SOC analyst / blue-team interviews because it walks the same logic you'd implement in Splunk, Elastic, or any SIEM correlation rule.
pip install phish-forge
# CLI
phish-forge analyze suspicious.eml
phish-forge analyze suspicious.eml --json | jq .
# HTTP service (drop emails via curl, integrate into a SOAR playbook)
phish-forge serve --port 8000
curl -F "file=@suspicious.eml" http://localhost:8000/analyze ┌─────────────────────────┐
│ phish-forge CLI │
│ / FastAPI app │
└────────────┬────────────┘
▼
┌─────────────────────────┐
│ EML parser │ (stdlib email module)
│ headers · body · urls │
└────────────┬────────────┘
▼
┌─────────────────┬─────────────┴────────────┬──────────────────┐
▼ ▼ ▼ ▼
┌────────┐ ┌───────────────┐ ┌─────────────┐ ┌──────────────┐
│ header │ │ URL scorer │ │ body hcheck │ │ LLM verdict │
│ auth │ │ shortener / │ │ urgency, │ │ (optional) │
│ (SPF/ │ │ typosquat / │ │ cred req, │ │ │
│ DKIM/ │ │ homoglyph / │ │ generic │ │ │
│ DMARC) │ │ .tk/.ml/.gq │ │ greeting │ │ │
└────┬───┘ └───────┬───────┘ └──────┬──────┘ └──────┬───────┘
└────────────────┴───────────────┬────────┴──────────────────┘
▼
┌──────────────────────────┐
│ weighted score 0-100 │
│ verdict: phishing / │
│ suspicious / benign │
└──────────────────────────┘
Each module is independently testable; the scorer is a simple weighted sum so you can tune it.
URL heuristics (each is one signal — combined into a score per URL):
- TLD on the high-risk list (
.tk,.ml,.gq,.cf,.zip,.mov) - IP address in the URL
- Common-domain typosquat (Levenshtein ≤ 1 from a known brand)
- Homoglyph confusables (
paypaIwith capital-I,аpplewith Cyrillic 'а') - Hostname is a link shortener (
bit.ly,tinyurl,t.co,goo.gl, etc.) - Hostname doesn't match the visible link text (anchor vs href mismatch)
- Path keywords (
verify,secure,account-update,login-reset)
Body heuristics:
- Urgency language (
24 hours,immediately,account locked,final notice) - Credential-request language (
enter your password,verify your login) - Generic greeting (
Dear Customer,Dear Sir/Madam) - Reply-to mismatch with From
- Plain-text → HTML inconsistency
Header heuristics:
- SPF / DKIM / DMARC
Authentication-Resultsparsing From:display-name vs envelopeReturn-Path:mismatch- Suspicious
Received:chain (private IPs forging public hop, country-of-origin flips)
phish-forge serveThen:
POST /analyze
Content-Type: multipart/form-data; boundary=...
(eml file)
→ 200 OK
{
"verdict": "phishing",
"score": 78,
"headers": { ... },
"urls": [ ... ],
"body_signals": [ ... ]
}Plug it into a SOAR runbook or just hit it from a Slack-bot intake form.
- Header / URL / body heuristics
- CLI + FastAPI
- JSON output for SOAR integration
- Pluggable LLM verdict explainer (Anthropic / OpenAI)
- Attachment sandboxing (PE / Office macro detection)
- WHOIS age check for URL hostnames
- Live reputation lookups (Google Safe Browsing, PhishTank, URLhaus)
- STIX 2.1 indicator export
Built by @forgehk — DarkForge AI