Hi, I noticed two sensor-selection scenarios where the prompt asks the agent to list sensors, but characteristic_form describes the expected answer as one or more failure modes.
Because characteristic_form is used as expected behavior during evaluation, this could make a correct sensor-list answer look wrong to the judge.
Source checked:
Affected rows:
| id |
type |
Prompt asks for |
characteristic_form currently expects |
| 111 |
FMSA |
sensors of Chiller 6 potentially relevant to Compressor Overheating |
one or more failure modes of Chiller 6 |
| 609 |
multiagent |
sensors of Chiller 6 at MAIN site potentially relevant to Compressor Overheating |
one or more failure modes of Chiller 6 |
Nearby rows suggest these should use sensor-oriented expected behavior:
- id 112 asks which sensor should be prioritized for compressor overheating and expects sensors from the Chiller 6 sensor list.
- id 610 is the multiagent version of that same sensor-prioritization task.
- The project guideline also lists "List all sensors of Chiller 6 that are potentially relevant to Compressor Overheating." under Sensor-Failure Mode Mapping examples.
Possible fix:
- Update ids 111 and 609 so
characteristic_form expects Chiller 6 sensor names, not failure modes.
- A replacement could use the same installed Chiller 6 sensor list already used by ids 112 and 610.
Minor adjacent cleanup: ids 112, 113, 610, and 611 already have sensor-oriented expected forms, but use the phrase one of more sensors; I think that should be one or more sensors.
While checking related task configs, I also noticed possible candidate-list cleanup items in data/task/failure_mapping_senarios.jsonl: ids 6 and 12 ask for failure modes, but some entries in their candidate failure-mode lists look like sensor/measurement terms (speed, power, pressure or vacuum, oil debris, etc.). That may be worth auditing separately.
Hi, I noticed two sensor-selection scenarios where the prompt asks the agent to list sensors, but
characteristic_formdescribes the expected answer as one or more failure modes.Because
characteristic_formis used as expected behavior during evaluation, this could make a correct sensor-list answer look wrong to the judge.Source checked:
data/task/failure_mapping_senarios.jsonldocs/guideline/case_study_industrial_asset_management.mdAffected rows:
characteristic_formcurrently expectsNearby rows suggest these should use sensor-oriented expected behavior:
Possible fix:
characteristic_formexpects Chiller 6 sensor names, not failure modes.Minor adjacent cleanup: ids 112, 113, 610, and 611 already have sensor-oriented expected forms, but use the phrase
one of more sensors; I think that should beone or more sensors.While checking related task configs, I also noticed possible candidate-list cleanup items in
data/task/failure_mapping_senarios.jsonl: ids 6 and 12 ask for failure modes, but some entries in their candidate failure-mode lists look like sensor/measurement terms (speed,power,pressure or vacuum,oil debris, etc.). That may be worth auditing separately.