We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
2 parents 7d76cdc + 438ea6a commit bec69a4Copy full SHA for bec69a4
1 file changed
examples/FinanceBench/ground-truths.yml
@@ -4288,6 +4288,8 @@ financebench_id_00603:
4288
correctness: >-
4289
the answer mentions new stores
4290
4291
+ evaluator-unreliable: true
4292
+
4293
4294
financebench_id_00605:
4295
sector: Consumer Discretionary
@@ -4316,6 +4318,8 @@ financebench_id_00605:
4316
4318
the answer contains a calculated percentage value that is in the range from 30% to 40%
4317
4319
(if the answer is a single number, assume that it is that calculated percentage value)
4320
4321
4322
4323
4324
financebench_id_00606: # tricky: highly implicit wordings
4325
0 commit comments