Skip to content

[FEAT] Adaptive multi-turn XPIA with backtracking (Crescendo strategy) #92

Description

@KutalVolkan

Summary

Add a crescendo-style multi-turn strategy for XPIA: escalate across turns, and when a trigger turn is refused or blocked, backtrack - drop that dead-end and re-approach from a clean state - instead of stopping on the first refusal. This fills the reserved "crescendo" strategy slot alongside the existing XPIA execution.

Motivation

Indirect prompt injection against a hardened agent rarely succeeds on the first trigger - the agent refuses, asks a clarifying question, or partially complies. A realistic adversary abandons the failed line and retries from a clean state. That recover-and-retry move is the defining behaviour of Crescendo, and it's what separates a one-shot probe from a genuine multi-turn attack.

Today XPIAExecution escalates forward and early-stops on detection, with no way to recover from a refusal and try another angle - so it under-represents adversaries that recover-and-refuse rather than hard-fail. The "crescendo" strategy name is already reserved in the codebase and named on the architecture diagram, but unfilled.

Proposed solution

A new execution strategy alongside XPIA that adds a backtracking branch to the loop: on a blocked turn, prune the adversary's last move and re-prompt with that failure as feedback, within a backtrack budget; on success, resolve as an attack exactly as XPIA does.

Backtracking is capability-aware, in two fidelity levels:

  • Adversary-side rewind - always available; rewinds the attacker's own state and feeds the failed attempt back into the next prompt.
  • Agent-side rewind - opt-in, for adapters that can roll the agent's own conversation back (true Crescendo fidelity), degrading gracefully to the first where they can't.

The backtrack trigger is its own evaluator (e.g. a refusal detector), kept separate from the success condition so the two stay independent and composable. The whole thing is additive - it reuses the existing execution lifecycle, drivers, evaluators, and result handling, and changes no existing behaviour. I'm open to shaping the exact surface to your conventions.

Additional context

Before I open a PR, I wanted to check whether Crescendo is already being worked on internally. If so, no worries; I’ll stand down to avoid duplicating effort. If the slot is open, I’d be happy to open the PoC PR, likely aiming for the weekend of June 27-28.

Update Closing as superseded by #100, which reframes this around indirect payload re-injection instead of the direct-chat crescendo framing here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions