Status: Experimental / internal prototype
Focus: Authorized, in-scope security testing (VDP / Bug Bounty, black-box)
We’re three security researchers based in Tokyo building an autonomous agent framework that can map an application, plan targeted security hypotheses, and produce a human-reviewable report — while enforcing strict safety constraints so it can’t wander out of scope.
This README describes the architecture and guardrails. There’s no public repo yet; we’re sharing the design and learnings for feedback.
- We built a multi-agent system that performs recon → hypothesis planning → class-specific testing → validation → report drafting.
- All network access is forced through a scope-enforcing proxy (allowlist + rate/concurrency caps + logging).
-
- Real-World Validation (Feb 8, 2026): Running against ~5 targets/week since late 2025.
- U.S. Dept of Defense (DoD): 3 vulnerabilities triaged.
- HackerOne Ranking: Reached #86 globally in the VDP (90 Days) leaderboard.
- Bug Bounty Programs: 2 duplicates, 1 under review.
- Real-World Validation (Feb 8, 2026): Running against ~5 targets/week since late 2025.
- Benchmarks: Solved 84% of PortSwigger Web Security Academy labs autonomously.
- An autonomous testing engine designed for authorized program scopes, with mandatory human approval before submission.
- A system optimized for precision, not conversation. It automatically produces and validates findings, requiring human oversight only as a final approver before reporting.
- A fully autonomous “submit-to-bounty” bot (we never auto-submit).
- A general internet crawler or exploitation toolkit.
- A replacement for a structured, coverage-driven pentest methodology (yet).
We designed a multi-agent orchestration workflow that mimics human Red Team methodology. To ensure safety and prevent spam, the final submission decision is always made by a human.
- Input: A Target URL & Credentials (for grey-box testing).
- Output: A Drafted Report for Human Review.
Everything in between is autonomous.
flowchart TD
%% --- Styles ---
classDef human fill:#ffab91,stroke:#333,stroke-width:2px,color:black;
classDef brain fill:#d4e157,stroke:#333,stroke-width:2px,color:black;
classDef worker fill:#80deea,stroke:#333,stroke-width:1px,color:black;
classDef external fill:#f9f,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5,color:black;
classDef tools fill:#e0e0e0,stroke:#333,stroke-width:1px,stroke-dasharray: 2 2,color:black;
classDef security fill:#ffcc80,stroke:#d35400,stroke-width:2px,color:black;
%% --- Nodes ---
subgraph s1 ["Start"]
User("👤 User Input<br>(Scope Definition)"):::human
end
subgraph Engine ["Autonomous AI Engine"]
direction TB
Recon("Initial Recon Agent"):::worker
Coord("Coordinator"):::brain
%% Workers & Tools
subgraph Exec ["Execution Layer"]
direction LR
IDOR("IDOR"):::worker
SQLi("SQLi"):::worker
XSS("XSS"):::worker
Tools[("🛠️ Tooling Sandbox<br>Python / Browser / CLI")]:::tools
end
Validator("Validator"):::brain
end
%% ★ Safety Layer Added
subgraph Guardrails ["Safety Layer"]
Proxy{"Safety Proxy<br>(Strict Whitelist)"}:::security
end
Target("🌐 Target Infrastructure"):::external
subgraph External ["External World"]
Human("Analyst"):::human
H1("HackerOne"):::external
end
%% --- Connections ---
User -->|Define Scope| Recon
Recon --> Coord
Coord --> IDOR & SQLi & XSS
%% Access via Proxy
Recon -.-> Tools
IDOR & SQLi & XSS -.-> Tools
%% ★ Traffic Flow through Proxy
Tools ===> Proxy
Proxy == "✅ Allowed (In-Scope)" ==> Target
Proxy -. "🚫 Blocked (Out-of-Scope)" .-> Proxy
IDOR --> Validator
SQLi -.-> Validator
XSS -.-> Validator
Validator <--> Tools
Validator --> Human
Human --> H1
The recon agent interacts with the target (through the proxy) to:
- enumerate reachable pages/endpoints within the provided scope
- infer high-level technology hints (framework patterns, API conventions)
- build an “attack surface map” (routes, parameters, auth boundaries)
Output: a structured map of candidate flows/endpoints for deeper analysis.
The coordinator:
- selects promising hypotheses from the recon map (e.g., authorization seams, state transitions)
- delegates work to specialized agents
- manages budget, rate limits, retries, and stop conditions
This avoids random fuzzing in favor of targeted exploration.
We use smaller agents that focus on a single class (e.g., IDOR, SQLi, XSS). The intent is to:
- reduce hallucinations and overgeneralization
- encode class-specific “what evidence matters” heuristics
- keep prompts and action spaces narrow
Example: the IDOR agent focuses on authorization invariants and ownership boundaries rather than generic injection payloads.
The validator:
- replays key requests under controlled conditions
- performs negative checks (expected-deny cases)
- collects artifacts (request/response samples, timestamps, environment notes)
- only then emits a notification and drafts a report for human review
Humans make the final call and submission decision.
Each agent operates in an isolated sandbox with:
- Python runtime (for quick parsing, diffing, state handling)
- Headless browser (for DOM rendering and JS-driven flows)
- Kali Linux shell (standard recon utilities, HTTP tooling, parsers)
Important: all network traffic is routed through the scope-enforcing proxy.
Safety is a hard constraint, not a feature. This system is intended only for authorized testing.
All outbound traffic must pass a policy gate. In practice this includes:
Allowlist controls
- FQDN allowlist
- method allowlist (e.g., GET/POST; optionally block PUT/DELETE by default)
- optional header constraints (prevent arbitrary outbound tokens/headers)
Throttling
- max RPS
- concurrency caps
Auditing
- full allow/deny logging (what was attempted, what was blocked, why)
- reproducible request traces for human review
Default-deny posture
- if the proxy can’t confidently classify a request as in-scope, it blocks it
The validator is designed to avoid destructive behavior. The engine:
- prioritizes read-only verification patterns
- avoids payloads that could cause damage or persistence
- stops on signs of instability (excess errors, unexpected side effects, account risk)
We do not provide exploit payloads or step-by-step instructions for real-world compromise in this README.
Since late 2025, we have been running the agent against 5 targets per week.
- VDP Success:
- Achieved 86th place globally on the HackerOne VDP (90 Days) leaderboard.
- Successfully identified and reported 3 triaged vulnerabilities in the U.S. Department of Defense (DoD) program.
- BBP Challenges:
- Submitted 2 reports to paid Bug Bounty Programs, but both were closed as Duplicate (other researchers found them first).
- One additional report is currently under review.
- Key Learning - "The Impact Gap":
- While the agent successfully executes exploits (technical validation), several findings were closed as Informative due to low business impact.
- Insight: The agent is good at finding "technical correctness" gaps but currently lacks the context to assess "business criticality" or chain low-severity bugs into high-impact kills.
Typical (representative) run characteristics:
- Wall time: ~2 hours
- Model/API cost: low single-digit USD (varies by latency, rate limiting, retries)
- Human time: review/verification + final report editing
We’re optimizing for:
- fewer false positives (high precision)
- strong evidence trails (validator artifacts)
- strict adherence to scope rules (proxy gate)
This is an experimental system and has clear constraints:
- SPA-heavy targets: performance degrades on modern single-page apps where meaningful state is primarily in client-side JS and requires deeper browser-driven exploration. We’re improving the “state model” and event coverage, but we’re not yet extracting full performance here.
- Context growth / rare loops: in some runs, accumulated context (observations, hypotheses, artifacts) can grow too large and the system may fall into inefficient behavior or rare infinite loops. We mitigate this with budgets, stop conditions, and periodic summarization, but it’s not eliminated.
- Coverage / reproducibility variance: as a vulnerability testing approach, it’s difficult to guarantee full coverage. Even on the same target, outcomes can vary — sometimes a class of issue is discovered, sometimes it isn’t — depending on exploration paths, timing, and defenses.
- Authorized testing only. Use strictly within explicit VDP / bug bounty scopes and rules.
- Human-in-the-loop. No automatic submissions.
- Scope enforcement. Hard gate via proxy + default-deny rules.
- No harmful payload sharing. We avoid publishing exploit details.
If you’re building similar systems, we’d love to compare notes specifically on guardrails and verification strategies.
Disclosure: Open to collaboration and feedback from others building similar systems.
- Email: info@layer8.jp

