Skip to content

fix: preserve HME100k prediction case in OCRBench scoring#1278

Merged
kcz358 merged 1 commit intoEvolvingLMMs-Lab:mainfrom
akawincent:fix/1220_HME100k_score
Apr 10, 2026
Merged

fix: preserve HME100k prediction case in OCRBench scoring#1278
kcz358 merged 1 commit intoEvolvingLMMs-Lab:mainfrom
akawincent:fix/1220_HME100k_score

Conversation

@akawincent
Copy link
Copy Markdown
Contributor

Summary

  • fix OCRBench scoring for the HME100k subset by preserving prediction case
  • keep the existing lowercase normalization for the other OCRBench subsets unchanged

Why

Issue #1220 points out that ocrbench_process_results lowercases pred before branching on dataset_name, while the HME100k branch intentionally compares answers without lowercasing them. That makes correct HME100k predictions score as 0 when the only difference is letter case.

Testing

  • not run (per request)

Closes #1220

Copy link
Copy Markdown
Collaborator

@kcz358 kcz358 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, so instead of just remove the lower(), maybe should actually lower the gt_ans as well?

@akawincent
Copy link
Copy Markdown
Contributor Author

akawincent commented Apr 9, 2026

Hi, so instead of just remove the lower(), maybe should actually lower the gt_ans as well?

@kcz358 I don't think we should lowercase gt_ans for HME100k.

HME100k should be case-sensitive, since it is handwritten mathematical expression recognition.

So, we should not lowercase pred, and we should not lowercase gt_ans either. The original problem was that pred was lowercased too early, which broke the intended HME100k matching behavior.

Since these answers are math-expression / LaTeX-like strings, lowercasing gt_ans could also create false positives for charactors like V, F, I, A, etc.

@kcz358
Copy link
Copy Markdown
Collaborator

kcz358 commented Apr 9, 2026

Got it, looks make sense if HME requires the answer to be case sensitive. Will this change cause false negative on other branches? If not then I will merge this PR. Thanks

@akawincent
Copy link
Copy Markdown
Contributor Author

@kcz358

No, this should not affect the other OCRBench branches.
I have confirmed that other branches do pred.lower() and gt_ans.lower() in else.... cuz they are case-insensitive.

@kcz358 kcz358 merged commit f54dd28 into EvolvingLMMs-Lab:main Apr 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OCRBench eval bug

2 participants