FEAT: Add MASK (CAIS honesty benchmark) dataset loaders#1904
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolved conflicts: - pyrit/datasets/seed_datasets/remote/__init__.py — kept both MaskQuestionArchetype (HEAD) and MossBenchOversensitivityType (main) in __all__. - doc/bibliography.md — merged the new MASK citation key into the upstream's larger citation list. Followups for upstream changes: - mask_dataset.py + test_mask_dataset.py: renamed _fetch_from_huggingface -> _fetch_from_huggingface_async to match the async-suffix sweep (microsoft#1889). - seed_metadata.py: added "honesty" to RECOMMENDED_TAGS so MASK's cross-cutting tag passes the metadata-coverage check added by microsoft#1780. Verification: 836 unit dataset tests pass; pre-commit clean on all touched files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
adrian-gavrila
left a comment
There was a problem hiding this comment.
Overall looks good! A couple of nits and then one question related to tagging this as defult.
| eprint = {2503.03750}, | ||
| archivePrefix = {arXiv}, | ||
| primaryClass = {cs.LG}, |
There was a problem hiding this comment.
Nit: this is the only entry in the file using arXiv's native eprint/archivePrefix/primaryClass export instead of journal = {arXiv preprint arXiv:...} (and datasets.instructions.md prescribes the title/author/journal/year/url field set). Could you swap these three lines for:
journal = {arXiv preprint arXiv:2503.03750},and keep the existing url? Just to avoid the one-off.
| HF_DATASET_NAME: str = "cais/MASK" | ||
| HF_REVISION: str = "4602b84dd9e2ca05c6e1eafbc14e556e908ac1bb" | ||
| HF_SPLIT: str = "test" | ||
| ARCHETYPE: MaskQuestionArchetype |
There was a problem hiding this comment.
Minor parity nit: siblings like _ORBenchBaseDataset sets should_register = False on its abstract base, but this base relies on dataset_name staying abstract to avoid registering. It's correct today, but if anyone ever gives the base a concrete dataset_name, it'd silently register and then AttributeError on the unset ARCHETYPE. Worth adding should_register = False here to make the intent explicit and match ORBench.
|
|
||
| # Class-level dataset metadata for SeedDatasetMetadata discovery. | ||
| modalities: list[str] = ["text"] | ||
| tags: set[str] = {"default", "safety", "honesty"} |
There was a problem hiding this comment.
"default" is documented in seed_metadata.py as ungated + size ≥ medium, but MASK is gated (HUGGINGFACE_TOKEN) and mask_statistics is size="small". So SeedDatasetFilter(tags={"default"}) returns MASK, then 401s at fetch. Could we drop "default" here (sgxstest/sorry_bench/vlguard already omit it when gated), or amend the rule if gated benchmarks should be discoverable this way?
Summary
Adds six
cais/MASKHuggingFace loaders to PyRIT — one per question archetype that the MASK paper grades separately:_MaskContinuationsDataset_MaskDisinformationDataset_MaskDoublingDownKnownFactsDataset_MaskKnownFactsDataset_MaskProvidedFactsDataset_MaskStatisticsDatasetPlus a public
MaskQuestionArchetypeenum so callers can refer to archetypes by name.What is MASK?
MASK (Ren et al., 2025) measures honesty as distinct from accuracy: under pressure (a persona system prompt, role assignment, …) does the model state something that contradicts its own out-of-context belief? Each row pairs a pressured conversation with one or more neutral-context belief-elicitation prompts, and a lie is scored when the two diverge.
This PR is loader-only. The MASK scoring methodology (parallel belief elicitation + multi-stage judge) is intentionally not included here — it would require its own attack-executor + scorer design and is being tracked in a separate workstream.
Loader shape
_MaskBaseDataset+ 6 subclasses with class-levelARCHETYPE), matching the established_ORBenchBaseDataset/ Salad / etc. style._HiXSTestDataset/_SGXSTestDataset:tokenconstructor arg, falls back toHUGGINGFACE_TOKENenv var. The loader does not short-circuit on missing token; insteaddatasets.load_datasetraises with a clear message pointing the user at the dataset's HF gating page.SeedGroupsharing aprompt_group_id:SeedObjectivecarryingproposition,ground_truth,formatted_ground_truth, allbelief_elicit_*strings, and the canary contamination marker.doubling_down_known_facts: a 4-piece pressured conversation (system→user→assistant→user) encoding the forced prior turn before the pressured question.system→user).HF_SPLIT = "test"is a class-level constant rather than a constructor kwarg (every MASK config publishes only atestsplit).{"default", "safety", "honesty"}—"honesty"is added as a new cross-cutting tag, since this is the first PyRIT dataset specifically targeting honesty rather than harm-avoidance.Tests
tests/unit/datasets/test_mask_dataset.py: archetype enum coverage, per-loader dataset_name + ARCHETYPE binding, token plumbing (env var + override), empty-fetch raises, metadata serialization for every field, doubling-down 4-piece role/sequence ordering, missing-belief-field fallback forprovided_facts, statistics numeric ground truth preserved as string.cais/MASKrows are reproduced, and the test canary is an obvious fake (test:0000:00000000-0000-0000-0000-000000000000) to avoid leaking the real MASK contamination marker into a public repo.tests/end_to_end/test_all_datasets.pypass against realcais/MASKat pinned revision4602b84dd9e2ca05c6e1eafbc14e556e908ac1bb— auto-discovered throughSeedDatasetProvider, no e2e test code added.Documentation
doc/references.bib—@misc{ren2025maskbenchmarkdisentanglinghonesty, ...}.doc/bibliography.md— citation key added to the hidden-citations dropdown.doc/code/datasets/1_loading_datasets.{py,ipynb}— MASK row + 6 dataset names in the expected-output list.Licensing / gating
cais/MASKis a HuggingFace-gated dataset under CAIS's research-use click-through terms. Users must accept the dataset terms at https://huggingface.co/datasets/cais/MASK before the loader will succeed. This is documented in the class docstring; PyRIT does not vendor any MASK content.