fix(browser): exclude DNS errors from consecutive failure limit#1156
fix(browser): exclude DNS errors from consecutive failure limit#1156
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Excludes Firefox dnsNotFound (NXDOMAIN) neterrors from contributing to the consecutive failure limit that stops crawling, and adds tests to ensure crawls continue through large numbers of nonexistent domains.
Changes:
- Added
_is_dns_error()predicate and used it to skip incrementingfailure_countondnsNotFound. - Added an integration test that navigates to 110
.invaliddomains to verify no failure-limit abort occurs. - Added a unit test that imports and validates the real
_is_dns_error()predicate.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
openwpm/browser_manager.py |
Adds _is_dns_error() and uses it to exclude dnsNotFound from the failure counter. |
test/test_webdriver_utils.py |
Adds integration + unit tests validating the new DNS-exclusion behavior and predicate correctness. |
.gitignore |
Adds ignore rules for Crosslink/Claude-generated local state. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| DNS resolution errors are expected when crawling large domain lists | ||
| (e.g. Tranco top-100k) and don't indicate a browser or instrumentation | ||
| failure. # Only NXDOMAIN; DNS timeouts/SERVFAIL intentionally still count |
There was a problem hiding this comment.
The docstring includes a # ... fragment that reads like an inline comment but is actually part of the public docstring text. Please rewrite that line as a normal sentence (e.g., “Only NXDOMAIN is excluded; DNS timeouts/SERVFAIL still count…”) to avoid confusion and keep generated docs clean.
| failure. # Only NXDOMAIN; DNS timeouts/SERVFAIL intentionally still count | |
| failure. Only NXDOMAIN is excluded; DNS timeouts and SERVFAIL | |
| intentionally still count. |
| import pytest | ||
|
|
||
| from openwpm.browser_manager import _is_dns_error |
There was a problem hiding this comment.
Tests importing a leading-underscore symbol can be confusing because _is_dns_error reads as “private/internal”. If this predicate is intended to be a stable, importable test surface, consider making it a public name (e.g., is_dns_error / is_dns_resolution_error) or relocating it to a dedicated utils module to clarify intended usage.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1156 +/- ##
==========================================
- Coverage 62.21% 62.16% -0.05%
==========================================
Files 40 40
Lines 3898 3901 +3
==========================================
Hits 2425 2425
- Misses 1473 1476 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Extract the inline is_dns_error predicate from execute_command_sequence into a module-level _is_dns_error() function so the test imports and exercises the real code instead of a local copy. Remove unused CommandExecutionError import from the test file.
e6d86c9 to
72ea044
Compare
Summary
dnsNotFound/ NXDOMAIN) no longer count toward the consecutive failure limit that stops crawling_is_dns_error()as a named, importable function for testabilityfailure_limit=5Only
dnsNotFoundis excluded — DNS timeouts/SERVFAIL intentionally still count as they may indicate real network issues.Supersedes #1139. Incorporates fixes from adversarial review (VDD methodology).
VDD Review History
Test plan
test_dns_error_does_not_count_against_failure_limitpassestest_is_dns_error_predicatetests the real_is_dns_errorfunctionpre-commit run --all-filespasses