Skip to content

Possible date/time-range mismatches in scenario expected behavior #310

@lentil32

Description

@lentil32

Hi, thanks for releasing AssetOpsBench. While looking through the Hugging Face scenarios data, I noticed a few rows where the prompt date/time range and the characteristic_form date/time range appear to disagree.

Since characteristic_form is used as expected behavior by the LLM judge, these mismatches may affect evaluation: a response using the date from the user prompt could be judged against a different date in the expected behavior.

Source checked:

Examples:

id Prompt asks for characteristic_form says Possible fix
10 first week of June 2020 last week Use "first week of June 2020", or clarify the intended reference date for "last week".
11 last week of April '20 past week Use "last week of April 2020", or clarify the intended reference date for "past week".
42 September 19, 2020 at quarter to midnight September 19, 2015 at 11:45pm Change 2015 to 2020 if the prompt is correct.
43 6/14/20 June 14, 2016 Change 2016 to 2020 if 6/14/20 means June 14, 2020.
45 Mar 13 '20 January 13, 2023 Change to March 13, 2020 if the prompt is correct.
48 September 19, 2020 at 7pm September 19, 2015 at 7pm Change 2015 to 2020 if the prompt is correct.
410 first week of June 2020 first week of June 2020, but later says "time range - First week of May 2020" Change the final May reference to June. I checked src/couchdb/sample_data/work_order/event.csv, and the "6 alert records" count appears to match the first week of June.

One related wording issue I noticed:

id Current wording Why it looks ambiguous
430 "two month's period from 2020-05-01T12:30:00 to 2022-06-30T19:30:00" The explicit timestamps cover more than two years, not two months. If the timestamps are intended, "two month's period" could be replaced with "the date range".

I am not assuming the prompt is always the source of truth here; the main issue is that the prompt and expected behavior currently point to different time ranges.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions