fix(ssrf): check every resolved DNS record, not just the first by atulya-singh · Pull Request #269 · aiming-lab/AutoResearchClaw

atulya-singh · 2026-05-22T18:45:14Z

Was poking around researchclaw/web/_ssrf.py after the recent bypass fixes in #254 and noticed the resolution path only looks at info[0]:

info = socket.getaddrinfo(hostname, None, socket.AF_UNSPEC, socket.SOCK_STREAM)
addr = ipaddress.ip_address(info[0][4][0])

If a hostname returns multiple A/AAAA records, the check only sees the first one. So a domain that resolves to e.g. [8.8.8.8, 127.0.0.1] will pass — but urllib / Crawl4AI can connect to either address. Pretty easy bypass for anyone who controls a DNS zone.

Changed the resolution path to loop over every record and block if any of them lands in a private/loopback/link-local/reserved range. Also threw in is_unspecified and is_multicast since they were the obvious gaps once I started listing things out (and added the resolved IP to the error message so it's easier to tell what got blocked from the logs). Pulled out the predicate into a small _is_blocked_addr helper so the literal-IP branch and the resolved branch share the same rules.

One unrelated cleanup: removed the _SUSPICIOUS_URL_RE regex at the top of the file — it's defined but nothing imports it, looks like leftover from an earlier draft of the backslash/userinfo check.

New tests in tests/test_web_crawler.py:

multi-record DNS where any record is private → blocked, with the bad IP in the message
multi-record DNS where all records are public → allowed
0.0.0.0, [::1], [::ffff:127.0.0.1]
the backslash and userinfo cases from SSRF bypass in check_url_ssrf #254 (didn't have direct tests on check_url_ssrf for those)

Not trying to fix DNS rebinding here — that needs resolve-once-and-pin-the-IP plumbing through the actual HTTP client, which is a bigger change. This just stops the trivial multi-record case.

Test plan

pytest tests/test_web_crawler.py — 35 passed (29 existing + 6 new)
pytest tests/ — 2790 passed, 56 skipped

`check_url_ssrf` resolves the hostname via `getaddrinfo` and then checks only `info[0]` against the private-IP ranges. If a domain returns multiple A/AAAA records — for example one public and one private — the first record passes the check while the underlying HTTP client remains free to connect to any returned address. Iterate every record and block the URL if any address is in a private / loopback / link-local / reserved / unspecified / multicast range. Also drop the unused `_SUSPICIOUS_URL_RE` regex. Adds tests for the multi-record case, 0.0.0.0, IPv6 loopback, IPv4-mapped IPv6 loopback, backslash bypass, and userinfo bypass.

Jiaaqiliu merged commit d66ef84 into aiming-lab:main May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ssrf): check every resolved DNS record, not just the first#269

fix(ssrf): check every resolved DNS record, not just the first#269
Jiaaqiliu merged 1 commit into
aiming-lab:mainfrom
atulya-singh:fix/ssrf-validate-all-dns-records

atulya-singh commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

atulya-singh commented May 22, 2026

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants