Skip to content

[FEAT] Adaptive multi-turn XPIA via iterative payload re-injection #100

Description

@KutalVolkan

Summary

XPIAExecution plants one malicious payload, triggers the agent once, and stops at the first refusal. I'd like to add an adaptive variant: if the agent refuses, re-plant a reworded payload into the same surface, trigger again, and repeat until it works or a budget runs out. The attacker only ever edits the planted data and sends the agent the same benign prompt - it never argues with the agent directly - so the attack stays indirect.

Motivation

Real indirect red-teaming is iterative. You rarely land the payload on the first try - you reword or re-encode it, re-trigger, see what got through, and try again. RAMPART can't express that loop today: single-shot XPIA treats the first refusal as the end.

That direct-vs-indirect line is the whole point, and it's why this isn't just "port Crescendo." In Crescendo the attacker is the user: it types escalating messages straight at the target and prunes the failed ones from the shared chat (that's how PyRIT drives it). That can't live in an indirect framework without making the trigger itself adversarial - which RAMPART explicitly forbids (attacks/__init__.py: "XPIA triggers are never adversarial - the attack is in the injected payload, not the prompt"). So instead of porting a direct technique, this gives the same try-again behaviour to the indirect channel, where the adaptation happens in the planted payload rather than the conversation.

Proposed solution

A new execution beside XPIAExecution that puts the loop around the injection instead of the chat. It reuses what's already there - Surface.inject() to re-plant, the benign trigger driver, the evaluators, and resolve_as_attack for the verdict - and reuses the existing injection-handle lifecycle for teardown, cycling one handle per round. Additive, no protocol changes. Cost per round depends on the surface (cheap where the payload is just overwritten, slower where re-indexing is involved). I'll bring the design and a working PoC in the PR rather than spec it out here.

Additional context

Before I open a PR: is an adaptive or iterative XPIA execution already on your roadmap? If so I'll stand down; if not, I'll put up a PoC shortly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions