fix: rework agent readiness scoring for realistic scores#18
Merged
Conversation
Reward things devs already do (stack, CI, tests, descriptions, commit quality) instead of gating 50% on files nobody has. devcard.json/llms.txt are now 5pt bonuses, not requirements.
There was a problem hiding this comment.
Pull request overview
This pull request updates the agent readiness scoring rubric to produce more realistic 0–100 scores by shifting weight from rarely-present “agent files” to common, machine-parseable repository signals (docs/stack/metadata/CI/tests/licenses/commit hygiene).
Changes:
- Reworked
compute_agent_readiness_scoreto use a new 100-point rubric with graduated stack depth, repo description coverage, test/license adoption, and commit-quality contributions, while makingdevcard.json/llms.txtsmall bonuses. - Updated and expanded scoring tests to match the new rubric and validate new signals (stack graduation, commit quality, repo descriptions).
- Adjusted expected “high score” thresholds and clamping tests for the new distribution.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/devcard/analyzers/scoring.py |
Replaces the old agent-readiness rubric with a new weighted scoring model based on common repo signals + small bonuses for agent-specific files. |
tests/test_scoring_dual.py |
Updates existing agent-readiness tests and adds new cases to cover the new scoring signals and expected score ranges. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
103
to
+105
| Rubric: | ||
| - devcard.json exists: 20 pts | ||
| - llms.txt exists: 10 pts | ||
| - Structured READMEs (docs_adoption): 15 pts | ||
| - Dependency files/stack: 10 pts (>=5 items=10, >=1=5) | ||
| - Topics/metadata coverage: 10 pts (proportional) | ||
| - Classification coverage: 10 pts (proportional) | ||
| - CI adoption: 5 pts | ||
| - AGENTS.md in active repos: 20 pts (reserved, returns 0) | ||
| - Dependency files/stack depth: 15 pts (graduated) |
Comment on lines
+161
to
+164
| if devcard.commit_quality is not None and devcard.commit_quality.commits_analyzed > 0: | ||
| conv_score = min(devcard.commit_quality.conventional_commits_pct / 100, 1.0) | ||
| msg_score = min(devcard.commit_quality.avg_message_length / 72, 1.0) | ||
| score += (conv_score * 0.6 + msg_score * 0.4) * 8 |
…magic number 1. Rubric comment: "Structured READMEs" → "Documentation adoption (docs signal)" to accurately reflect that docs_adoption measures the FILE_PATTERNS docs signal, not specifically structured READMEs. 2. Magic number 72 → _IDEAL_COMMIT_MSG_LENGTH constant. 72 is the git convention for max commit subject line length.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The old agent readiness score gated 50 points (out of 100) behind files nobody has —
devcard.json(20pts),llms.txt(10pts),AGENTS.md(20pts, not even tracked). A developer with perfect repos maxed out at ~50.New scoring rewards things developers already do that make profiles machine-parseable:
Real-world scores now make sense:
Test plan