replaces standard requests with curl_cffi to avoid 403 errors in BLS_CPI_Category#1954
Open
balit-raibot wants to merge 5 commits intodatacommonsorg:masterfrom
Open
replaces standard requests with curl_cffi to avoid 403 errors in BLS_CPI_Category#1954balit-raibot wants to merge 5 commits intodatacommonsorg:masterfrom
balit-raibot wants to merge 5 commits intodatacommonsorg:masterfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces the curl_cffi library to the cpi_category_download.py script, replacing the standard requests library to bypass 403 errors by impersonating a browser's TLS fingerprint. While this addresses the access issue, the implementation also removed the streaming download logic, which now loads entire files into memory. Feedback was provided to restore chunked writing to prevent potential out-of-memory errors when handling large ZIP archives.
saanikaaa
approved these changes
Apr 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR change is to fix the blocker from BLS website, detailed as below:
Recently, the BLS (Bureau of Labor Statistics) updated their security posture to include TLS Fingerprinting. Standard HTTP libraries (like Python's requests) are now being identified as automated scripts and blocked with a 403 Forbidden error, even when valid User-Agent headers are provided.
By using curl_cffi with the impersonate parameter, we can mimic a real browser's TLS handshake (fingerprint), ensuring reliable access to the supplemental Excel reports required for our downstream tasks.
REPORT_JSON=Link
REPORT_SUMMARY=Link
DIFFER=Link
DIFFER_VALIDATION=No deletions, expected additions for new data.