Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs
Large Language Models are increasingly applied to chemistry, tackling tasks such as molecular name conversion, captioning, text-guided generation, and property or reaction prediction. A molecule's properties are fundamentally determined by its composition and structure, encoded in its molecular graph; thus, reasoning about molecular properties requires understanding and reasoning over the molecular structure.
Yet, most existing benchmarks emphasize general chemical knowledge, rely on literature or surrogate labels that risk leakage or bias, or reduce evaluation to multiple-choice questions.
MolecularIQ is a molecular structure reasoning benchmark focused exclusively on symbolically verifiable tasks. It enables fine-grained evaluation of reasoning over molecular graphs and produces capability fingerprints that localize model failures to specific tasks and molecular regimes.
Reasoning tasks. MolecularIQ covers three categories of reasoning:
- Counting: Feature and substructure counting on molecular graphs
- Indexing: Index-based attribution of atoms, bonds, or substructures
- Constrained generation: Generation of valid molecules under structural constraints
Three types of complexity axes. MolecularIQ spans three orthogonal axes:
- Molecular complexity: Tests models across molecules of varying structural complexity.
- Multitask load: Evaluates performance across different amounts of reasoning requirements.
- SMILES representation: Tests robustness across different SMILES representations.
This repository serves as an entry point to the MolecularIQ ecosystem, covering the benchmark dataset creation, the leaderboard, the evaluation procedure with lm-eval-harness. The dynamic version MolecularIQD is part of the core package.
| Repository | Purpose |
|---|---|
| 📍 moleculariq | Current repo, overview over different MolecularIQ code bases |
| moleculariq-leaderboard | Leaderboard: HuggingFace space, displays results, handles submissions |
| moleculariq-core | MoelcuarIQD and shared library providing core functionality, e.g. symbolic verifiers and question formatting |
| moleculariq-benchmark | Dataset creation: task definitions, symbolic verifiers implementations, question generator |
| moleculariq-eval | Evaluation code: integration with lm-eval-harness, model configs, reward functions, extraction functions, and system prompts |
If you use MolecularIQ in your research, please cite:
@inproceedings{
bartmann2026moleculariq,
title={Molecular{IQ}: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs},
author={Christoph Bartmann and Johannes Schimunek and Mykyta Ielanskyi and Philipp Seidl and G{\"u}nter Klambauer and Sohvi Luukkonen},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=RqwEzZqMFv}
}