My Experience Testing Gandalf: Agent Breaker

I had the chance to join the closed beta of Gandalf: Agent Breaker. This was a full-scale red-team challenge featuring 10 real-world LLM applications with five difficulty levels each (50 challenges total, 5000 points possible), designed to simulate how models behave when deployed in practice.

What Stood Out

Unpredictable models: The same prompt sometimes produced different results. Variability became not just noise but a tool—running identical prompts repeatedly could flip an outcome from failure to success.
Layered defenses: Moderation often worked at multiple stages, like persona-level filtering plus a final scan.
Fragile precision: Structured attempts (e.g., JSON forcing, system instruction rewriting) were powerful but easily broken by a single misplaced word.
Exact repetition requirements: Some challenges demanded getting the AI to repeat text exactly, down to the number of repetitions.

My Strategy

My approach evolved into something resembling reconnaissance: probing boundaries, testing small variations, and mapping weak points. I treated each challenge like an environment to survey—observing how subtle adjustments revealed the system’s contours.

This reminded me of my very first AI lesson years ago, when I was asked to write a Monte Carlo simulation in Python. Back then, it felt abstract. In Gandalf, it suddenly clicked—AI behavior often comes down to probability and repeated trials.

Key Takeaways

Persistence over cleverness: small adjustments often succeed where first attempts fail
Inconsistency is opportunity: model variability can be strategically leveraged
Multiple defense layers: bypassing one doesn’t guarantee success with others
Precision is fragile: structured inputs are powerful but break easily
Systematic > random: methodical boundary probing mirrors real security research

Example of level completion (100/100 score)

Closing Thoughts

This beta was more than a game—it felt like real security research. It showed me how crowdsourced red teaming can generate insights that strengthen AI safety, and it validated the skills I’ve been building in cybersecurity and AI evaluation.

👉 Try Gandalf Agent Breaker yourself: gandalf.lakera.ai/agent-breaker

Disclaimer

All content is shared for educational and research purposes only.
These are my personal observations from the beta, not universal strategies or solutions.
No proprietary systems, exploits, or confidential data are included.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
screenshot.png		screenshot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My Experience Testing Gandalf: Agent Breaker

What Stood Out

My Strategy

Key Takeaways

Closing Thoughts

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

My Experience Testing Gandalf: Agent Breaker

What Stood Out

My Strategy

Key Takeaways

Closing Thoughts

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages