Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/code/datasets/1_loading_datasets.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@
" 'airt_illegal',\n",
" 'airt_imminent_crisis',\n",
" 'airt_leakage',\n",
" 'airt_licensed_therapist',\n",
" 'airt_malware',\n",
" 'airt_misinformation',\n",
" 'airt_scams',\n",
Expand Down
49 changes: 21 additions & 28 deletions doc/code/scenarios/0_scenarios.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -104,9 +104,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Found default environment files: ['./.pyrit/.env', './.pyrit/.env.local']\n",
"Loaded environment file: ./.pyrit/.env\n",
"Loaded environment file: ./.pyrit/.env.local\n"
"Found default environment files: ['./.pyrit/.env']\n",
"Loaded environment file: ./.pyrit/.env\n"
]
},
{
Expand Down Expand Up @@ -269,31 +268,22 @@
" airt.psychosocial\u001b[0m\n",
" Class: Psychosocial\n",
" Description:\n",
" Psychosocial Harms Scenario implementation for PyRIT. This scenario\n",
" contains various psychosocial harm-based checks that evaluate model\n",
" behavior for responses that may harm the well-being or mental health of\n",
" users. Examples include mistreating users in a mental health crisis or\n",
" aiding them in self-harm. All datasets in this scenario share the\n",
" unified 'psychosocial' harm category, but each dataset (e.g.,\n",
" imminent_crisis, dependency) can have unique: - Scoring rubrics - Seed\n",
" objectives The scenario tests whether models: - Recognize mental health\n",
" crises and respond appropriately - Avoid facilitating self-harm or\n",
" harmful behaviors - Provide crisis resources and encourage professional\n",
" help Example usage with custom configurations: # Define custom\n",
" configurations per subharm category custom_configs = {\n",
" \"airt_imminent_crisis\": SubharmConfig(\n",
" crescendo_system_prompt_path=\"path/to/custom_escalation.yaml\",\n",
" scoring_rubric_path=\"path/to/custom_rubric.yaml\", ), } scenario =\n",
" Psychosocial(subharm_configs=custom_configs) await\n",
" scenario.initialize_async( objective_target=target_llm,\n",
" scenario_strategies=[PsychosocialStrategy.ImminentCrisis], )\n",
" Single psychosocial scenario covering imminent-crisis and\n",
" licensed-therapist subharms. Each ``(technique × subharm)`` pair becomes\n",
" one ``AtomicAttack`` with the subharm's own scorer (and, for crescendo,\n",
" its own escalation prompt). A separate baseline ``AtomicAttack`` is\n",
" prepended **per subharm**, each using that subharm's matching scorer —\n",
" so baseline scoring is never mismatched with the seed's actual rubric.\n",
" Subharm selection happens via ``--dataset-names``: pass one or both of\n",
" ``airt_imminent_crisis`` / ``airt_licensed_therapist``. ``--strategies``\n",
" selects techniques (``prompt_sending``, ``role_play``, ``crescendo``).\n",
" Aggregate Strategies:\n",
" - all\n",
" Available Strategies (2):\n",
" imminent_crisis, licensed_therapist\n",
" Default Strategy: all\n",
" Default Datasets (1, max 4 per dataset):\n",
" airt_imminent_crisis\n",
" - all, default\n",
" Available Strategies (3):\n",
" prompt_sending, role_play, crescendo\n",
" Default Strategy: default\n",
" Default Datasets (2, max 4 per dataset):\n",
" airt_imminent_crisis, airt_licensed_therapist\n",
"\u001b[1m\u001b[36m\n",
" airt.rapid_response\u001b[0m\n",
" Class: RapidResponse\n",
Expand Down Expand Up @@ -478,6 +468,9 @@
}
],
"metadata": {
"jupytext": {
"main_language": "python"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
Expand All @@ -488,7 +481,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.13"
"version": "3.11.15"
}
},
"nbformat": 4,
Expand Down
Loading
Loading