fix(ssrf): check every resolved DNS record, not just the first#269
Merged
Jiaaqiliu merged 1 commit intoMay 29, 2026
Merged
Conversation
`check_url_ssrf` resolves the hostname via `getaddrinfo` and then checks only `info[0]` against the private-IP ranges. If a domain returns multiple A/AAAA records — for example one public and one private — the first record passes the check while the underlying HTTP client remains free to connect to any returned address. Iterate every record and block the URL if any address is in a private / loopback / link-local / reserved / unspecified / multicast range. Also drop the unused `_SUSPICIOUS_URL_RE` regex. Adds tests for the multi-record case, 0.0.0.0, IPv6 loopback, IPv4-mapped IPv6 loopback, backslash bypass, and userinfo bypass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Was poking around
researchclaw/web/_ssrf.pyafter the recent bypass fixes in #254 and noticed the resolution path only looks atinfo[0]:If a hostname returns multiple A/AAAA records, the check only sees the first one. So a domain that resolves to e.g.
[8.8.8.8, 127.0.0.1]will pass — but urllib / Crawl4AI can connect to either address. Pretty easy bypass for anyone who controls a DNS zone.Changed the resolution path to loop over every record and block if any of them lands in a private/loopback/link-local/reserved range. Also threw in
is_unspecifiedandis_multicastsince they were the obvious gaps once I started listing things out (and added the resolved IP to the error message so it's easier to tell what got blocked from the logs). Pulled out the predicate into a small_is_blocked_addrhelper so the literal-IP branch and the resolved branch share the same rules.One unrelated cleanup: removed the
_SUSPICIOUS_URL_REregex at the top of the file — it's defined but nothing imports it, looks like leftover from an earlier draft of the backslash/userinfo check.New tests in
tests/test_web_crawler.py:0.0.0.0,[::1],[::ffff:127.0.0.1]check_url_ssrf#254 (didn't have direct tests oncheck_url_ssrffor those)Not trying to fix DNS rebinding here — that needs resolve-once-and-pin-the-IP plumbing through the actual HTTP client, which is a bigger change. This just stops the trivial multi-record case.
Test plan
pytest tests/test_web_crawler.py— 35 passed (29 existing + 6 new)pytest tests/— 2790 passed, 56 skipped