Skip to content

Commit e543f83

Browse files
committed
fix: don't count DNS resolution errors against consecutive failure limit
When crawling large domain lists (e.g. Tranco top 100k), many domains cannot be resolved via DNS. These dnsNotFound neterrors are expected and do not indicate a browser or instrumentation failure. Previously they were counted against MAX_CONSECUTIVE_FAILURES, eventually crashing the crawl. Skip the failure counter increment and browser restart for dnsNotFound neterrors so that DNS resolution failures no longer crash large crawls. Fixes #1116
1 parent 40e044f commit e543f83

1 file changed

Lines changed: 12 additions & 1 deletion

File tree

openwpm/browser_manager.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -461,7 +461,18 @@ def execute_command_sequence(
461461
)
462462
return
463463

464-
if command_status != "ok":
464+
# DNS resolution errors are expected when crawling large domain
465+
# lists (e.g. Tranco top 100k) and don't indicate a browser or
466+
# instrumentation failure, so we skip the failure counter and
467+
# browser restart for them. See:
468+
# https://github.com/openwpm/OpenWPM/issues/1116
469+
is_dns_error = (
470+
command_status == "neterror"
471+
and error_text is not None
472+
and error_text == "dnsNotFound"
473+
)
474+
475+
if command_status != "ok" and not is_dns_error:
465476
with task_manager.threadlock:
466477
task_manager.failure_count += 1
467478
if task_manager.failure_count > task_manager.failure_limit:

0 commit comments

Comments
 (0)