Skip to content

layer8-ai/autonomous-pentest-agent-research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Autonomous Bug Bounty Agent (with Scope-Enforcing Proxy + PoC Validator)

Status: Experimental / internal prototype
Focus: Authorized, in-scope security testing (VDP / Bug Bounty, black-box)

We’re three security researchers based in Tokyo building an autonomous agent framework that can map an application, plan targeted security hypotheses, and produce a human-reviewable report — while enforcing strict safety constraints so it can’t wander out of scope.

This README describes the architecture and guardrails. There’s no public repo yet; we’re sharing the design and learnings for feedback.


TL;DR

  • We built a multi-agent system that performs recon → hypothesis planning → class-specific testing → validation → report drafting.
  • All network access is forced through a scope-enforcing proxy (allowlist + rate/concurrency caps + logging).
    • Real-World Validation (Feb 8, 2026): Running against ~5 targets/week since late 2025.
      • U.S. Dept of Defense (DoD): 3 vulnerabilities triaged.
      • HackerOne Ranking: Reached #86 globally in the VDP (90 Days) leaderboard.
      • Bug Bounty Programs: 2 duplicates, 1 under review.
  • Benchmarks: Solved 84% of PortSwigger Web Security Academy labs autonomously.

HackerOne Triage Verification

HackerOne VDP Ranking


What this is / isn’t

✅ This is

  • An autonomous testing engine designed for authorized program scopes, with mandatory human approval before submission.
  • A system optimized for precision, not conversation. It automatically produces and validates findings, requiring human oversight only as a final approver before reporting.

❌ This is not

  • A fully autonomous “submit-to-bounty” bot (we never auto-submit).
  • A general internet crawler or exploitation toolkit.
  • A replacement for a structured, coverage-driven pentest methodology (yet).

The Architecture:

We designed a multi-agent orchestration workflow that mimics human Red Team methodology. To ensure safety and prevent spam, the final submission decision is always made by a human.

  • Input: A Target URL & Credentials (for grey-box testing).
  • Output: A Drafted Report for Human Review.

Everything in between is autonomous.

flowchart TD
    %% --- Styles ---
    classDef human fill:#ffab91,stroke:#333,stroke-width:2px,color:black;
    classDef brain fill:#d4e157,stroke:#333,stroke-width:2px,color:black;
    classDef worker fill:#80deea,stroke:#333,stroke-width:1px,color:black;
    classDef external fill:#f9f,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5,color:black;
    classDef tools fill:#e0e0e0,stroke:#333,stroke-width:1px,stroke-dasharray: 2 2,color:black;
    classDef security fill:#ffcc80,stroke:#d35400,stroke-width:2px,color:black;

    %% --- Nodes ---

    subgraph s1 ["Start"]
        User("👤 User Input<br>(Scope Definition)"):::human
    end

    subgraph Engine ["Autonomous AI Engine"]
        direction TB

        Recon("Initial Recon Agent"):::worker
        Coord("Coordinator"):::brain

        %% Workers & Tools
        subgraph Exec ["Execution Layer"]
            direction LR
            IDOR("IDOR"):::worker
            SQLi("SQLi"):::worker
            XSS("XSS"):::worker

            Tools[("🛠️ Tooling Sandbox<br>Python / Browser / CLI")]:::tools
        end

        Validator("Validator"):::brain
    end

    %% ★ Safety Layer Added
    subgraph Guardrails ["Safety Layer"]
        Proxy{"Safety Proxy<br>(Strict Whitelist)"}:::security
    end

    Target("🌐 Target Infrastructure"):::external

    subgraph External ["External World"]

        Human("Analyst"):::human
        H1("HackerOne"):::external
    end

    %% --- Connections ---
    User -->|Define Scope| Recon
    Recon --> Coord
    Coord --> IDOR & SQLi & XSS

    %% Access via Proxy
    Recon -.-> Tools
    IDOR & SQLi & XSS -.-> Tools

    %% ★ Traffic Flow through Proxy
    Tools ===> Proxy
    Proxy == "✅ Allowed (In-Scope)" ==> Target
    Proxy -. "🚫 Blocked (Out-of-Scope)" .-> Proxy

    IDOR --> Validator
    SQLi -.-> Validator
    XSS -.-> Validator

    Validator <--> Tools
    Validator --> Human
    Human --> H1
Loading

Architecture (overview)

1) Initial Recon Agent

The recon agent interacts with the target (through the proxy) to:

  • enumerate reachable pages/endpoints within the provided scope
  • infer high-level technology hints (framework patterns, API conventions)
  • build an “attack surface map” (routes, parameters, auth boundaries)

Output: a structured map of candidate flows/endpoints for deeper analysis.

2) Coordinator

The coordinator:

  • selects promising hypotheses from the recon map (e.g., authorization seams, state transitions)
  • delegates work to specialized agents
  • manages budget, rate limits, retries, and stop conditions

This avoids random fuzzing in favor of targeted exploration.

3) Specialized testing agents

We use smaller agents that focus on a single class (e.g., IDOR, SQLi, XSS). The intent is to:

  • reduce hallucinations and overgeneralization
  • encode class-specific “what evidence matters” heuristics
  • keep prompts and action spaces narrow

Example: the IDOR agent focuses on authorization invariants and ownership boundaries rather than generic injection payloads.

4) Validator + Report Drafting

The validator:

  • replays key requests under controlled conditions
  • performs negative checks (expected-deny cases)
  • collects artifacts (request/response samples, timestamps, environment notes)
  • only then emits a notification and drafts a report for human review

Humans make the final call and submission decision.


Execution environment

Each agent operates in an isolated sandbox with:

  • Python runtime (for quick parsing, diffing, state handling)
  • Headless browser (for DOM rendering and JS-driven flows)
  • Kali Linux shell (standard recon utilities, HTTP tooling, parsers)

Important: all network traffic is routed through the scope-enforcing proxy.


Safety model and guardrails

Safety is a hard constraint, not a feature. This system is intended only for authorized testing.

Scope-Enforcing Proxy

All outbound traffic must pass a policy gate. In practice this includes:

Allowlist controls

  • FQDN allowlist
  • method allowlist (e.g., GET/POST; optionally block PUT/DELETE by default)
  • optional header constraints (prevent arbitrary outbound tokens/headers)

Throttling

  • max RPS
  • concurrency caps

Auditing

  • full allow/deny logging (what was attempted, what was blocked, why)
  • reproducible request traces for human review

Default-deny posture

  • if the proxy can’t confidently classify a request as in-scope, it blocks it

“Safe PoC” policy

The validator is designed to avoid destructive behavior. The engine:

  • prioritizes read-only verification patterns
  • avoids payloads that could cause damage or persistence
  • stops on signs of instability (excess errors, unexpected side effects, account risk)

We do not provide exploit payloads or step-by-step instructions for real-world compromise in this README.


Experimental Results (As of Feb 8, 2026)

Since late 2025, we have been running the agent against 5 targets per week.

  • VDP Success:
    • Achieved 86th place globally on the HackerOne VDP (90 Days) leaderboard.
    • Successfully identified and reported 3 triaged vulnerabilities in the U.S. Department of Defense (DoD) program.
  • BBP Challenges:
    • Submitted 2 reports to paid Bug Bounty Programs, but both were closed as Duplicate (other researchers found them first).
    • One additional report is currently under review.
  • Key Learning - "The Impact Gap":
    • While the agent successfully executes exploits (technical validation), several findings were closed as Informative due to low business impact.
    • Insight: The agent is good at finding "technical correctness" gaps but currently lacks the context to assess "business criticality" or chain low-severity bugs into high-impact kills.

Performance (one representative run)

Typical (representative) run characteristics:

  • Wall time: ~2 hours
  • Model/API cost: low single-digit USD (varies by latency, rate limiting, retries)
  • Human time: review/verification + final report editing

We’re optimizing for:

  • fewer false positives (high precision)
  • strong evidence trails (validator artifacts)
  • strict adherence to scope rules (proxy gate)

Limitations / current challenges

This is an experimental system and has clear constraints:

  • SPA-heavy targets: performance degrades on modern single-page apps where meaningful state is primarily in client-side JS and requires deeper browser-driven exploration. We’re improving the “state model” and event coverage, but we’re not yet extracting full performance here.
  • Context growth / rare loops: in some runs, accumulated context (observations, hypotheses, artifacts) can grow too large and the system may fall into inefficient behavior or rare infinite loops. We mitigate this with budgets, stop conditions, and periodic summarization, but it’s not eliminated.
  • Coverage / reproducibility variance: as a vulnerability testing approach, it’s difficult to guarantee full coverage. Even on the same target, outcomes can vary — sometimes a class of issue is discovered, sometimes it isn’t — depending on exploration paths, timing, and defenses.

Ethics

  • Authorized testing only. Use strictly within explicit VDP / bug bounty scopes and rules.
  • Human-in-the-loop. No automatic submissions.
  • Scope enforcement. Hard gate via proxy + default-deny rules.
  • No harmful payload sharing. We avoid publishing exploit details.

If you’re building similar systems, we’d love to compare notes specifically on guardrails and verification strategies.


Contact / Disclosure

Disclosure: Open to collaboration and feedback from others building similar systems.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors