-
Notifications
You must be signed in to change notification settings - Fork 796
FEAT Allow dataset parameters on dataset configuration when loading from memory #2120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,6 +24,7 @@ | |
| _parse_initializer_arg, | ||
| build_parameters_from_api, | ||
| non_negative_int, | ||
| parse_dataset_parameter, | ||
| positive_int, | ||
| validate_log_level_argparse, | ||
| ) | ||
|
|
@@ -267,6 +268,13 @@ def _build_base_parser(*, add_help: bool = True) -> ArgumentParser: | |
| type=positive_int, | ||
| help=ARG_HELP["max_dataset_size"], | ||
| ) | ||
| run_group.add_argument( | ||
| "--dataset-parameters", | ||
| type=parse_dataset_parameter, | ||
| nargs="+", | ||
| metavar="KEY=VALUE", | ||
| help=ARG_HELP["dataset_parameters"], | ||
| ) | ||
|
|
||
| return parser | ||
|
|
||
|
|
@@ -644,6 +652,8 @@ def _build_run_request(*, parsed_args: Namespace, scenario_name: str) -> RunScen | |
| kwargs["dataset_names"] = parsed_args.dataset_names | ||
| if parsed_args.max_dataset_size is not None: | ||
| kwargs["max_dataset_size"] = parsed_args.max_dataset_size | ||
| if parsed_args.dataset_parameters: | ||
| kwargs["dataset_parameters"] = dict(parsed_args.dataset_parameters) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rich agrees with this comment but it is copilot generated:
|
||
| if parsed_args.memory_labels: | ||
| kwargs["labels"] = parse_memory_labels(json_string=parsed_args.memory_labels) | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -50,6 +50,12 @@ class RunScenarioRequest(BaseModel): | |
| strategies: list[str] | None = Field(None, description="Strategy names to use (uses scenario default if omitted)") | ||
| dataset_names: list[str] | None = Field(None, description="Dataset names to use (uses scenario default if omitted)") | ||
| max_dataset_size: int | None = Field(None, ge=1, description="Maximum items per dataset") | ||
| dataset_parameters: dict[str, Any] | None = Field( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rich agrees with this comment but it is copilot generated: Three names for one concept makes this hard to follow: it's |
||
| None, | ||
| description=( | ||
| "Dataset seed filters keyed by field, applied before sampling. Accepted keys: harm_categories, data_types." | ||
| ), | ||
| ) | ||
| max_concurrency: int = Field(10, ge=1, le=100, description="Maximum concurrent operations") | ||
| max_retries: int = Field(0, ge=0, le=20, description="Maximum retry attempts on failure") | ||
| labels: dict[str, str] | None = Field(None, description="Labels to attach to memory entries") | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rich agrees with this comment but it is copilot generated:
The comma-list syntax presents one uniform surface, but the two exposed filters resolve with different semantics in
get_seeds:harm_categories→_add_list_conditionsANDs a.contains()per value, soharm_categories=cyber,violencemeans "tagged with both cyber AND violence," and it's a substring match.data_types→.in_(values), sodata_types=text,image_pathmeans "either type," exact match.A user writing
harm_categories=cyber,violencewill almost certainly expect a union and instead silently get the (likely empty) intersection. At minimum the help text and thefiltersdocstring should spell out that comma-list semantics are per-field (AND/substring for harm categories, OR/exact for data types). Worth a test that pins each field's behavior against real seeds, since this is the surface most likely to confuse people.