Proper scoring rules, reduces LLM overconfidence in multiple-choice QA.
calibration hallucination confidence-estimation abstention selective-prediction llm mmlu proper-scoring-rules mmlu-pro simpleqa
-
Updated
Jun 13, 2026 - Python