I had the chance to join the closed beta of Gandalf: Agent Breaker. This was a full-scale red-team challenge featuring 10 real-world LLM applications with five difficulty levels each (50 challenges total, 5000 points possible), designed to simulate how models behave when deployed in practice.
-
Unpredictable models: The same prompt sometimes produced different results. Variability became not just noise but a tool—running identical prompts repeatedly could flip an outcome from failure to success.
-
Layered defenses: Moderation often worked at multiple stages, like persona-level filtering plus a final scan.
-
Fragile precision: Structured attempts (e.g., JSON forcing, system instruction rewriting) were powerful but easily broken by a single misplaced word.
-
Exact repetition requirements: Some challenges demanded getting the AI to repeat text exactly, down to the number of repetitions.
My approach evolved into something resembling reconnaissance: probing boundaries, testing small variations, and mapping weak points. I treated each challenge like an environment to survey—observing how subtle adjustments revealed the system’s contours.
This reminded me of my very first AI lesson years ago, when I was asked to write a Monte Carlo simulation in Python. Back then, it felt abstract. In Gandalf, it suddenly clicked—AI behavior often comes down to probability and repeated trials.
-
Persistence over cleverness: small adjustments often succeed where first attempts fail
-
Inconsistency is opportunity: model variability can be strategically leveraged
-
Multiple defense layers: bypassing one doesn’t guarantee success with others
-
Precision is fragile: structured inputs are powerful but break easily
-
Systematic > random: methodical boundary probing mirrors real security research
Example of level completion (100/100 score)
This beta was more than a game—it felt like real security research. It showed me how crowdsourced red teaming can generate insights that strengthen AI safety, and it validated the skills I’ve been building in cybersecurity and AI evaluation.
👉 Try Gandalf Agent Breaker yourself: gandalf.lakera.ai/agent-breaker
All content is shared for educational and research purposes only.
These are my personal observations from the beta, not universal strategies or solutions.
No proprietary systems, exploits, or confidential data are included.
Original work by 0xSweet
© 2025 0xSweet. All rights reserved.