Skip to content

replaces standard requests with curl_cffi to avoid 403 errors in BLS_CPI_Category#1954

Open
balit-raibot wants to merge 5 commits intodatacommonsorg:masterfrom
balit-raibot:impersonate-browser-tls-fix-403
Open

replaces standard requests with curl_cffi to avoid 403 errors in BLS_CPI_Category#1954
balit-raibot wants to merge 5 commits intodatacommonsorg:masterfrom
balit-raibot:impersonate-browser-tls-fix-403

Conversation

@balit-raibot
Copy link
Copy Markdown
Contributor

@balit-raibot balit-raibot commented Apr 15, 2026

This PR change is to fix the blocker from BLS website, detailed as below:

Recently, the BLS (Bureau of Labor Statistics) updated their security posture to include TLS Fingerprinting. Standard HTTP libraries (like Python's requests) are now being identified as automated scripts and blocked with a 403 Forbidden error, even when valid User-Agent headers are provided.

By using curl_cffi with the impersonate parameter, we can mimic a real browser's TLS handshake (fingerprint), ensuring reliable access to the supplemental Excel reports required for our downstream tasks.

REPORT_JSON=Link
REPORT_SUMMARY=Link
DIFFER=Link
DIFFER_VALIDATION=No deletions, expected additions for new data.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the curl_cffi library to the cpi_category_download.py script, replacing the standard requests library to bypass 403 errors by impersonating a browser's TLS fingerprint. While this addresses the access issue, the implementation also removed the streaming download logic, which now loads entire files into memory. Feedback was provided to restore chunked writing to prevent potential out-of-memory errors when handling large ZIP archives.

Comment thread statvar_imports/us_bls/cpi_category/cpi_category_download.py Outdated
@balit-raibot balit-raibot changed the title replaces standard requests with curl_cffi to avoid 403 errors from BLS source replaces standard requests with curl_cffi to avoid 403 errors in BLS_CPI_Category Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants