fix(mail): boot postfix self-heal used short hostname → self-relay loop could return#27
Merged
Merged
Conversation
…op could return The boot-time postfix self-heal decided smart-host vs direct-MX by calling host_is_local(smtp_host, agent_hostname) with the SHORT hostname (`hostname`), while the runtime mta_reconfigure path correctly uses `hostname -f`. So a relay set to the node's OWN fqdn (e.g. `s4.example.com` on host `s4`) slipped past host_is_local at boot into relayhost=[s4.example.com] — the exact "mail for localhost loops back to myself" self-relay loop #23 fixed. And since EmailConfigSet self-restarts the agent, that buggy boot path re-ran after every mail save, silently undoing the correct decision mta_reconfigure had just made. Fix: - Boot resolves the FQDN once (via `hostname -f`, short-name fallback) BEFORE the smart-host decision and passes it to host_is_local — matching the runtime path. Also removes the now-duplicate `hostname -f` block in the direct-MX arm. - Defense in depth: host_is_local also catches a relay typed as our full fqdn when we only know our SHORT name (degraded `hostname -f`), via a short-label compare scoped to that case only — a legit external relay sharing our short label (our mail.acme.com vs relay mail.sendgrid.net) is still a smart-host once we know our real fqdn. clippy -D warnings clean; host_is_local self/loopback + degraded-short-fqdn unit tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Regression found by the Fable-5 audit (mail area)
The boot-time postfix self-heal picked smart-host vs direct-MX with:
while the runtime
mta_reconfigurepath useshostname -f(full FQDN). Trace, hosts4, relay set to the node's own fqdns4.digitalka.cz:h = s4.digitalka.cz,my_fqdn = s4h == fqdn? no."s4".split('.').next()=Some("s4")==Some("s4.digitalka.cz")? no → not local →relayhost=[s4.digitalka.cz]→ the "mail for localhost loops back to myself" self-relay loop that #24/fix(mail): localhost/self SMTP relay uses direct-MX (no relay-to-itself loop) #23 fixed.Worse:
EmailConfigSetself-restarts the agent, so this buggy boot path re-ran after every mail save, silently undoing the correct decisionmta_reconfigurehad just made.Fix
hostname -f, short-name fallback) before the smart-host decision and passes it tohost_is_local— same yardstick as the runtime path. Removes the now-duplicatehostname -fblock from the direct-MX arm.host_is_local: also catch a relay typed as our full fqdn when we only know our short name (degradedhostname -f), via a short-label compare scoped to that case only — so a legit external relay that merely shares our short label (ourmail.acme.comvsmail.sendgrid.net) is still a smart-host once we know our real fqdn.Test
clippy -D warningsclean;host_is_localself/loopback + new degraded-short-fqdn unit tests green. This was flagged by the audit but its verify stage hit a session limit, so I traced + confirmed it against the code by hand before fixing.🤖 Generated with Claude Code