Skip to content

fix: skip non-added mirror files with missing base content in token scoring#1193

Open
MkDev11 wants to merge 3 commits into
entrius:testfrom
MkDev11:fix/issue-1192-missing-base-content-scoring
Open

fix: skip non-added mirror files with missing base content in token scoring#1193
MkDev11 wants to merge 3 commits into
entrius:testfrom
MkDev11:fix/issue-1192-missing-base-content-scoring

Conversation

@MkDev11
Copy link
Copy Markdown
Contributor

@MkDev11 MkDev11 commented May 12, 2026

Summary

calculate_token_score_from_file_changes had no guard for non-added files whose base-side blob is unavailable. When a modified or renamed file arrives with old_content=None and valid new_content, score_tree_diff receives None as the old tree, treats old_signatures as an empty Counter, and counts every node in the head file as a net addition — inflating the miner's token score and downstream reward.

The das-github-mirror fetcher (extractBlobText) returns null for any base blob that is binary, over 1 MB, or unreachable from git. The legacy PAT fetcher in github_api_tools.py applies the same three conditions. Both paths feed into the shared calculate_token_score_from_file_changes, so both OSS and mirror PR scoring are affected.

Fix: one elif guard after skipped-unsupported that skips non-added files with old_content=None (scoring_method='skipped-missing-base', score=0.0). Added files with old_content=None are legitimate new-file contributions and continue to reach tree-diff unchanged.

Adjacent reference: PR #977 added the skipped-large guard for oversized old_content but left the None-for-non-added case unguarded.

Related Issues

Closes #1192

Type of Change

  • Bug fix

Testing

  • Tests added/updated

test_modified_file_missing_base_content_is_skipped — regression: asserts scoring_method='skipped-missing-base' and score=0.0 for a modified file with old_content=None; was tree-diff before the fix.

test_renamed_file_missing_base_content_is_skipped — regression: same guard for renamed files.

test_added_file_null_old_content_scores_as_new_file — control: added file with old_content=None still reaches tree-diff with score>0.

Full suite: uv run pytest tests/ -v → 1491 passed.

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Changes are documented (if applicable)

@xiao-xiao-mao xiao-xiao-mao Bot added the bug Something isn't working label May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Non-added files scored as full new files when base content is unavailable in token scoring

2 participants