[BREAKING] FEAT: Restructure Psychosocial scenario for per-subharm scoring by varunj-msft · Pull Request #1943 · microsoft/PyRIT

varunj-msft · 2026-06-04T22:45:14Z

Description

Restructures the Psychosocial scenario so strategies are techniques and subharms are datasets. Started as the --max-dataset-size fast-path bugfix and grew into the full standardization Rich sketched in review.

Strategies are techniques now. The strategy enum is prompt_sending, role_play, crescendo instead of the old subharm-as-strategy anti-pattern. Subharm selection moves off --strategies and onto --dataset-names.
Seed file split per subharm. psychosocial.prompt becomes airt_imminent_crisis.prompt + airt_licensed_therapist.prompt, each carrying its own scorer rubric and crescendo escalation prompt. This removes runtime harm-filtering entirely — there's no longer a cap-vs-filter ordering to get wrong, which is what made --max-dataset-size 1 fail before.
Per-subharm scorers. One FloatScaleThresholdScorer is built per subharm and routed to both the AtomicAttack and the technique's AttackScoringConfig, so running all no longer scores attacks with the wrong rubric.
Per-subharm baselines. Baselines are named baseline_imminent_crisis / baseline_licensed_therapist instead of two atomics both named "baseline", which collided in _display_group_map / attack_results (keyed on name alone). include_baseline=False is forced through to the base class so its auto-injection rescue doesn't re-add a generic baseline.
initialize_async validates dataset names. A user-supplied dataset_config is allowed only if its dataset names are a subset of the subharms, so --max-dataset-size N still works while custom names are rejected with a clear error.

Crescendo is kept out of the default aggregate (opt-in via --strategies all / --strategies crescendo) since it's the heaviest technique — default runs are single-turn. BASELINE_ATTACK_POLICY is back to Enabled, and VERSION is bumped 2→3 because the default behavior changed; stored v2 results raise cleanly on --resume rather than silently mixing semantics.

One deliberate divergence from the review sketch, flagged inline: TARGET_REQUIREMENTS is left at the base default rather than EDITABLE_HISTORY. With Crescendo opt-in, the default run is single-turn, and requiring editable history at the scenario level would reject OpenAIChatTarget before the strategy resolves. Crescendo enforces that requirement itself when it runs. Easy to flip back if preferred.

Tests and Documentation

Reworked tests/unit/scenario/airt/test_psychosocial.py around the new shape — 10 test classes:

TestPsychosocialStrategyEnum / TestPsychosocialTechniques — enum members, default tags, Crescendo excluded from default
TestSubharmConfigs — both subharms wired with the right dataset, scorer prompt, and crescendo path
TestPsychosocialInitialization — VERSION == 3, baseline policy, no scenario-level target requirement
TestPsychosocialDatasetConfigValidation — max_dataset_size=1 on a valid subharm doesn't raise; custom / unknown names do
TestPsychosocialCrossProduct — atomic attacks are the (technique × subharm) product with correct names
TestPsychosocialPerSubharmScorer — each subharm's attacks (and baseline) get that subharm's scorer
TestPsychosocialBaselineHandling — per-subharm baseline names, both present in the display map, include_baseline=False suppresses them
TestPsychosocialLazyAdversarialResolution / TestPsychosocialSingleSubharmOverride — lazy adversarial target, single-subharm narrowing

Validation:

pytest tests/unit/scenario/airt/test_psychosocial.py → 53 passed
pytest tests/unit/scenario/ (full scenario suite) → 725 passed, no regressions
pytest tests/unit/backend/test_scenario_run_service.py → 35 passed
ruff check + ruff format --check + ty → all clean
Live pyrit_scan fast path (--strategies prompt_sending --dataset-names airt_imminent_crisis --max-dataset-size 1) and a default run both complete successfully

Docs updated: doc/scanner/airt.py (+notebook) for the fast-path command and the --dataset-names model, the programming guide (0_scenarios.ipynb), and the built-in dataset list (1_loading_datasets.ipynb) now includes airt_licensed_therapist.

rlundeen2 · 2026-06-04T23:05:37Z

+        # Load the unsampled seed pool so the harm-category filter sees every seed
+        # the dataset config would otherwise sample over. Temporarily zero the cap
+        # and restore it in a finally so a raising loader leaves the config intact.
+        sampling_cap = self._dataset_config.max_dataset_size


Rather than creating a global cap; there are also some corner case bugs with this implementation.

Instead, I'd subclass DatasetConfiguration to something like this

class PsychosocialDatasetConfiguration(DatasetConfiguration): def get_seed_groups(self) -> dict[str, list[SeedGroup]]: loaded = self._load_unsampled() # per-dataset, no cap yet filtered = self._filter_by_harm(loaded) # uses self._scenario_strategies return {k: self._apply_max_dataset_size(v) for k, v in filtered.items()}

Went a different route that drops the subclass entirely: split the seed file into one dataset per subharm (airt_imminent_crisis/airt_licensed_therapist). With the harm filtering gone there's nothing left to cap manually, so plain DatasetConfiguration works. initialize_async just checks the names are a subset of the subharms, so --max-dataset-size N still works and those corner cases disappear. What do you think of this?

…oring Replace the subharm-as-strategy anti-pattern with a technique-axis strategy enum (prompt_sending, role_play, crescendo). Subharm selection now happens via --dataset-names; the seed file is split into per-subharm datasets (airt_imminent_crisis, airt_licensed_therapist), each with its own scorer rubric and crescendo escalation prompt. Key changes: - Per-subharm FloatScaleThresholdScorer routed to both the AtomicAttack objective_scorer and the technique AttackScoringConfig, so baseline and technique attacks are scored with the matching rubric. - Per-subharm baselines named baseline_<subharm> to avoid the _display_group_map / attack_results key collision from duplicate baseline names; include_baseline=False is forced through to the base class to suppress the auto-injection rescue. - initialize_async validates dataset_config against the subharm dataset names (subset only) so --max-dataset-size still works while custom names are rejected. - Crescendo kept out of the default aggregate (opt-in via --strategies all / crescendo) as it is the heaviest technique. - BASELINE_ATTACK_POLICY re-enabled; VERSION bumped 2 -> 3 (BREAKING) so stored results from the prior default behavior cannot silently resume. - TARGET_REQUIREMENTS left at base default (no EDITABLE_HISTORY) since the default run is single-turn; crescendo enforces its own requirements at attack instantiation. - Docs/datasets updated: fast-path command, programming guide, and the built-in dataset list now reflect the per-subharm split. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

varunj-msft · 2026-06-17T15:52:50Z

Pushed the restructure. Only open question is TARGET_REQUIREMENTS (in the refactor thread). Also retitling since this isn't a bugfix anymore

rlundeen2 · 2026-06-22T19:58:43Z

Maybe we should hold off until the DataConfiguration refactor is in

rlundeen2 · 2026-06-22T20:14:34Z

+                if not seed_groups_for_subharm:
+                    continue
+                baseline_scorer = scorers_by_dataset[cfg.dataset_name]
+                baseline_attack_technique = PromptSendingAttack(


Rich agrees with this comment but it is copilot generated:

Let''s fix this in the base class rather than hand-rolling. The hand-rolled PromptSendingAttack + AtomicAttack here is effectively a fork of _build_baseline_atomic_attack, which hard-codes atomic_attack_name="baseline" and self._objective_scorer — that''s the only reason it can''t be reused for per-subharm baselines.

Proposal: generalize the base helper to _build_baseline_atomic_attacks(...) (plural) that accepts the seed groups plus an optional objective_scorer, name, and display_group, and returns list[AtomicAttack]. Single-baseline scenarios call it with one spec; this scenario passes one per subharm. That deletes the bespoke construction here and keeps baseline wiring (labels, attribution, future converters) in one place.

I think this also dissolves the initialize_async workaround — it''s the same root cause. The reason you have to resolve include_baseline locally and force include_baseline=False into super is that the base rescue at scenario.py:687 detects "did the override already emit a baseline?" via the literal atomic_attack_name != "baseline", and ours are baseline_<subharm>. If we add an is_baseline: bool to AtomicAttack (set by the base helper) and switch that rescue to any(aa.is_baseline for aa in self._atomic_attacks), the per-subharm baselines are recognized, the rescue stops firing, and the whole _effective_include_baseline interception + forced False can go away. (Side note: the comments here reference scenario.py:670, but the rescue is at 687 now — that drift is exactly the smell we''d remove.)

Separately: matching seed prompts to techniques/subharms is the genuinely awkward part of this file. I''m refactoring DatasetConfiguration to make that mapping first-class, so let''s leave the _SUBHARMS table as-is for now and migrate it once that lands.

rlundeen2 reviewed Jun 4, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/airt/psychosocial.py Outdated

rlundeen2 reviewed Jun 4, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/airt/psychosocial.py Outdated

rlundeen2 reviewed Jun 5, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/airt/psychosocial.py Outdated

varunj-msft force-pushed the varunj-msft/8380-Standardizing-Scenarios-Psychosocial-bugfix branch from 6106e65 to 5dcb74a Compare June 17, 2026 01:30

varunj-msft changed the title ~~[BREAKING] FIX: Psychosocial harm-category filtering and baseline default~~ [BREAKING] FEAT: Restructure Psychosocial scenario for per-subharm scoring Jun 17, 2026

rlundeen2 self-assigned this Jun 22, 2026

rlundeen2 reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BREAKING] FEAT: Restructure Psychosocial scenario for per-subharm scoring#1943

[BREAKING] FEAT: Restructure Psychosocial scenario for per-subharm scoring#1943
varunj-msft wants to merge 1 commit into
microsoft:mainfrom
varunj-msft:varunj-msft/8380-Standardizing-Scenarios-Psychosocial-bugfix

varunj-msft commented Jun 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

rlundeen2 Jun 4, 2026

Uh oh!

varunj-msft Jun 17, 2026

Uh oh!

Uh oh!

Uh oh!

varunj-msft commented Jun 17, 2026

Uh oh!

rlundeen2 commented Jun 22, 2026

Uh oh!

rlundeen2 Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

varunj-msft commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests and Documentation

Uh oh!

Uh oh!

rlundeen2 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

varunj-msft Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

varunj-msft commented Jun 17, 2026

Uh oh!

rlundeen2 commented Jun 22, 2026

Uh oh!

rlundeen2 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

varunj-msft commented Jun 4, 2026 •

edited

Loading