Skip to content

feat: skip implicit H addition for metal atoms#228

Draft
fbaensch-beilstein wants to merge 2 commits into
devfrom
default_valences
Draft

feat: skip implicit H addition for metal atoms#228
fbaensch-beilstein wants to merge 2 commits into
devfrom
default_valences

Conversation

@fbaensch-beilstein

@fbaensch-beilstein fbaensch-beilstein commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

What

InChI no longer adds implicit hydrogens to metal atoms. Previously a bare
main-group metal was saturated to its normal valence with hydrogen, so a lone
sodium became InChI=1S/Na.H; it now yields just InChI=1S/Na, consistent with
how transition metals (e.g. InChI=1S/Fe) have always been handled.

Closes #99

Why

Adding hydrogen to a bare metal atom to fill an assumed "normal valence" is not
chemically meaningful for metals. Transition metals were already flagged to skip
H addition; this extends the same treatment to the main-group metals, leaving
only non-metals, metalloids and noble gases to receive implicit H.

How

The behaviour is governed by the per-element bSkipAddingH flag in the default
valence tables. The flag is now set for the main-group metals (Li, Na, K, Rb,
Cs, Fr, Be, Mg, Ca, Sr, Ba, Ra, Al, Ga, In, Tl, Sn, Pb, Bi, Po) in both
tables that drive H addition:

  • ElData in INCHI_BASE/src/util.c (standard InChI path)
  • MolecularInorganicsArray in INCHI_BASE/src/strutil.c (-MolecularInorganics path)

Only non-metals, metalloids and noble gases keep bSkipAddingH == 0. Noble
gases keep it but have zero normal valence, so they never gain H regardless.

Commits

  • style: reformat MolecularInorganicsArray to the aligned one-entry-per-line
    ElData layout — purely cosmetic, all element data unchanged.
  • feat: set bSkipAddingH for the main-group metals in both tables and add
    test_implicitHydrogen.cpp.

Tests

  • New INCHI-1-TEST/tests/test_unit/test_implicitHydrogen.cpp pins the behaviour
    end-to-end across main-group metals, transition metals, non-metals/metalloids
    and noble gases (48 elements).
  • Full ctest unit suite: 16/16 pass.
  • Python CLI tests (test_executable): pass (no regressions).

Collapse the MolecularInorganicsArray periodic-table entries from the
multi-line valence layout to one aligned entry per line, mirroring the
ElData table in util.c. Purely cosmetic -- all element data is unchanged.
Set bSkipAddingH for the main-group metals (Li, Na, K, Rb, Cs, Fr, Be,
Mg, Ca, Sr, Ba, Ra, Al, Ga, In, Tl, Sn, Pb, Bi, Po) in both default
valence tables (ElData in util.c and the mirrored MolecularInorganicsArray
in strutil.c). A bare metal atom now yields just the element
(e.g. Na -> InChI=1S/Na) instead of being saturated to its normal valence
with hydrogen (previously Na -> InChI=1S/Na.H). Only non-metals, metalloids
and noble gases still receive implicit H.

Add test_implicitHydrogen.cpp covering main-group metals, transition
metals, non-metals/metalloids and noble gases.
@fbaensch-beilstein

Copy link
Copy Markdown
Collaborator Author

⚠️ The regression suite will fail — this is expected, not a regression

Running the regression suite (config_ci.py, both inchi.sdf.gz and
mcule.sdf.gz) reports 85 differences against the stored
*.regression_reference.sqlite baselines. All 85 are caused by this change
— main-group metals no longer gain implicit H — and none are genuine
regressions. Verified exhaustively:

  • Direction is consistent in all 85: old reference = new output + extra H
    (or a now-unnecessary "Metal was disconnected" warning). Never the reverse.
  • Single-element cases affect exactly the changed set:
    Li Na K Rb Cs Fr Be Mg Ca Sr Ba Ra Al Ga In Tl Sn Pb Bi Po.
  • Real compounds are structurally untouched — e.g. NSC-35615a:
    …C58H50N10O23S6.Na.H…C58H50N10O23S6.Na. The 58-carbon backbone is
    byte-identical; only the sodium counterion lost its spurious hydrogen.
  • 0 unexplained / suspicious diffs.

The reference databases were generated before this change, so they encode
the old Na.H / Li.H / Ca.2H behaviour; the suite is correctly flagging the
intended output change.

Action required to land: regenerate the regression baseline and commit the
updated .sqlite files as part of this PR:

python INCHI-1-TEST/tests/test_library/inchi_tests/run_tests.py \
    --test=regression-reference \
    --lib-path=<path/to/libinchi.so> \
    --data-config=INCHI-1-TEST/tests/test_library/config/config_ci.py

The 4 pre-existing expected_failures PUBCHEM compounds in config_ci.py are
unrelated to this change.

@github-actions

Copy link
Copy Markdown

Unit Test Coverage Report

Coverage Regression Summary

Metric PR (%) Base (%) Difference (pp)
Lines 18.3 18.1 +0.20
Branches 11.9 11.8 +0.10

Find details on the base coverage at https://iupac-inchi.github.io/InChI/coverage/index.html

Find details on this PR's coverage by downloading coverage-reports-27815793724 and opening html/index.html

@nbehrnd

nbehrnd commented Jun 19, 2026

Copy link
Copy Markdown

@fbaensch-beilstein The landing page lists Openeye in InChI's Board of Trustees - is this is a close collaboration, i.e. OpenEye both implements this in their products / services quickly?

The reason for this question is a related question about why Na drawn as such on the canvas of BioviaDraw 1) yields Na as Hill formula and 100% Na as composition, however 2) both creates an InChI 1S/Na.H as well assign as chemical name sodium hydride. When I asked there (March 13, 2025) -- because .mol/.sdf syntax once maintained by MDL / IsisDraw, now by Dassault / BioviaDraw -- the reply by Martin Hornig was (quote):

this is a known limitation of the IUPAC naming tool. We are licensing the IUPAC tool from our partner OpenEye and bundling it with many BIOVIA apps. The above issue was discussed with OpenEye and they consider it a design decision. They will not change it.

Elemental metals can be assigned an explicit valence of (0) or be drawn as radicals. Both results in a proper IUPAC name like e.g. "sodium".

We are not happy with this but have to accept OpenEye's decision. thanks you for your understanding.

Not convinced the original url to the user forum about "Biovia Draw, analysis and name of a compound" is accessible without a login, I add a print-to-pdf here and ping Mr Hornig about your pull request here.

Biovia Draw, analysis and name of a compound.pdf

@fbaensch-beilstein

Copy link
Copy Markdown
Collaborator Author

@nbehrnd I can't say anything about the generation of the incorrect chemical name.

For clarification: Until now, the standard InChI itself also generated InChI=1S/Na.H for a lone sodium. It filled the atom's assumed "default valence" with a hydrogen and disconnected it afterwards. After this PR a lone sodiume produces InChI=1S/Na, matching how other metals (e.g. Fe -> InChI=1S/Fe) have been handled. This is independent of any vendor. So far it is a test run and I will prepare a version for this in the InChI WebDemo. So I can not say anything about how soon it will be part of the standard InChI and nothing about when, or indeed whether, BIOVIA/OpenEye will adopt this behaviour and the corresponding InchI version.

@fbaensch-beilstein fbaensch-beilstein self-assigned this Jun 19, 2026
@fbaensch-beilstein fbaensch-beilstein added the enhancement New feature or request label Jun 19, 2026
@fbaensch-beilstein fbaensch-beilstein marked this pull request as draft June 19, 2026 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect handling of single metal atoms also leads to "hallucinating hydrogens"

2 participants