fix: skip non-added mirror files with missing base content in token scoring#1193
Open
MkDev11 wants to merge 3 commits into
Open
fix: skip non-added mirror files with missing base content in token scoring#1193MkDev11 wants to merge 3 commits into
MkDev11 wants to merge 3 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
calculate_token_score_from_file_changeshad no guard for non-added files whose base-side blob is unavailable. When a modified or renamed file arrives withold_content=Noneand validnew_content,score_tree_diffreceivesNoneas the old tree, treatsold_signaturesas an emptyCounter, and counts every node in the head file as a net addition — inflating the miner's token score and downstream reward.The das-github-mirror fetcher (
extractBlobText) returnsnullfor any base blob that is binary, over 1 MB, or unreachable from git. The legacy PAT fetcher ingithub_api_tools.pyapplies the same three conditions. Both paths feed into the sharedcalculate_token_score_from_file_changes, so both OSS and mirror PR scoring are affected.Fix: one
elifguard afterskipped-unsupportedthat skips non-added files withold_content=None(scoring_method='skipped-missing-base',score=0.0). Added files withold_content=Noneare legitimate new-file contributions and continue to reach tree-diff unchanged.Adjacent reference: PR #977 added the
skipped-largeguard for oversizedold_contentbut left theNone-for-non-added case unguarded.Related Issues
Closes #1192
Type of Change
Testing
test_modified_file_missing_base_content_is_skipped— regression: assertsscoring_method='skipped-missing-base'andscore=0.0for a modified file withold_content=None; wastree-diffbefore the fix.test_renamed_file_missing_base_content_is_skipped— regression: same guard for renamed files.test_added_file_null_old_content_scores_as_new_file— control: added file withold_content=Nonestill reachestree-diffwithscore>0.Full suite:
uv run pytest tests/ -v→ 1491 passed.Checklist