phish-forge

Drop a .eml in, get a verdict out. Header forensics, URL analysis, suspicious-pattern heuristics, and an LLM verdict explainer — packaged as both a CLI and a FastAPI service.

What it does

Hand phish-forge a raw email (.eml) and it returns a structured verdict:

phish-forge analyze samples/phishing_sample.eml

phish-forge verdict: PHISHING  (score 78/100)

  Headers
    ! SPF                fail        (sender 199.x.x.x not in spf for paypal.com)
    ! DKIM               none        (no signature)
    ! DMARC              fail        (alignment failed)
    ! From display       'PayPaI Security' (capital-I instead of capital-L)

  URLs (3 found)
    ! http://paypa1.com-secure-login.tk/verify    typosquat + .tk TLD
    ! https://bit.ly/3xyZ123                       link shortener hides destination
    - https://paypal.com/legal                     legitimate

  Body heuristics
    ! urgency_language        "Your account will be locked in 24 hours"
    ! credential_request      "click here to verify your password"
    ! generic_greeting        "Dear Customer,"

  Verdict: PHISHING — high confidence

Why this matters

Real phishing detection in a SOC pipeline combines:

Header authentication — does SPF/DKIM/DMARC actually align?
URL reputation — is the destination a known phishing domain, a typosquat, a shortener, or a homoglyph?
Content heuristics — urgency language, credential-request templates, mismatched display name vs. envelope sender.
Optional LLM analysis — for things heuristics miss (tone, social-engineering pattern).

phish-forge is a learning exercise in chaining all four. Useful for SOC analyst / blue-team interviews because it walks the same logic you'd implement in Splunk, Elastic, or any SIEM correlation rule.

Install & run

pip install phish-forge

# CLI
phish-forge analyze suspicious.eml
phish-forge analyze suspicious.eml --json | jq .

# HTTP service (drop emails via curl, integrate into a SOAR playbook)
phish-forge serve --port 8000
curl -F "file=@suspicious.eml" http://localhost:8000/analyze

Architecture

                      ┌─────────────────────────┐
                      │     phish-forge CLI     │
                      │      / FastAPI app      │
                      └────────────┬────────────┘
                                   ▼
                      ┌─────────────────────────┐
                      │      EML parser         │ (stdlib email module)
                      │  headers · body · urls  │
                      └────────────┬────────────┘
                                   ▼
   ┌─────────────────┬─────────────┴────────────┬──────────────────┐
   ▼                 ▼                          ▼                  ▼
┌────────┐    ┌───────────────┐         ┌─────────────┐    ┌──────────────┐
│ header │    │   URL scorer  │         │ body hcheck │    │  LLM verdict │
│ auth   │    │ shortener /   │         │  urgency,   │    │  (optional)  │
│ (SPF/  │    │ typosquat /   │         │  cred req,  │    │              │
│ DKIM/  │    │ homoglyph /   │         │  generic    │    │              │
│ DMARC) │    │ .tk/.ml/.gq   │         │  greeting   │    │              │
└────┬───┘    └───────┬───────┘         └──────┬──────┘    └──────┬───────┘
     └────────────────┴───────────────┬────────┴──────────────────┘
                                      ▼
                         ┌──────────────────────────┐
                         │   weighted score 0-100   │
                         │   verdict: phishing /    │
                         │   suspicious / benign    │
                         └──────────────────────────┘

Each module is independently testable; the scorer is a simple weighted sum so you can tune it.

Heuristics shipped today

URL heuristics (each is one signal — combined into a score per URL):

TLD on the high-risk list (.tk, .ml, .gq, .cf, .zip, .mov)
IP address in the URL
Common-domain typosquat (Levenshtein ≤ 1 from a known brand)
Homoglyph confusables (paypaI with capital-I, аpple with Cyrillic 'а')
Hostname is a link shortener (bit.ly, tinyurl, t.co, goo.gl, etc.)
Hostname doesn't match the visible link text (anchor vs href mismatch)
Path keywords (verify, secure, account-update, login-reset)

Body heuristics:

Urgency language (24 hours, immediately, account locked, final notice)
Credential-request language (enter your password, verify your login)
Generic greeting (Dear Customer, Dear Sir/Madam)
Reply-to mismatch with From
Plain-text → HTML inconsistency

Header heuristics:

SPF / DKIM / DMARC Authentication-Results parsing
From: display-name vs envelope Return-Path: mismatch
Suspicious Received: chain (private IPs forging public hop, country-of-origin flips)

API mode

phish-forge serve

Then:

POST /analyze
Content-Type: multipart/form-data; boundary=...

(eml file)

→ 200 OK
{
  "verdict": "phishing",
  "score": 78,
  "headers": { ... },
  "urls": [ ... ],
  "body_signals": [ ... ]
}

Plug it into a SOAR runbook or just hit it from a Slack-bot intake form.

Roadmap

Header / URL / body heuristics
CLI + FastAPI
JSON output for SOAR integration
Pluggable LLM verdict explainer (Anthropic / OpenAI)
Attachment sandboxing (PE / Office macro detection)
WHOIS age check for URL hostnames
Live reputation lookups (Google Safe Browsing, PhishTank, URLhaus)
STIX 2.1 indicator export

License

MIT

Built by @forgehk — DarkForge AI

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
fixtures		fixtures
phish_forge		phish_forge
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

phish-forge

What it does

Why this matters

Install & run

Architecture

Heuristics shipped today

API mode

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

phish-forge

What it does

Why this matters

Install & run

Architecture

Heuristics shipped today

API mode

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages