TestIQ provides a comprehensive quality score (0-100) with letter grade (A+ to F) based on three components:
-
Duplication Score (50%) - Penalizes exact duplicate tests
- 100 = No exact duplicates
- Decreases by 2 points per 1% of duplicate tests
-
Coverage Efficiency Score (30%) - Penalizes subset duplicates
- 100 = No subset tests (every test covers unique lines)
- Decreases by 1 point per 1% of subset tests
- Subset test: A test whose coverage is completely contained in another test
-
Uniqueness Score (20%) - Penalizes similar tests
- 100 = All tests are unique
- Decreases based on similarity threshold matches
Overall Score: 68.8/100 (Grade: D)
├─ Duplication Score: 97.7/100 ← Few exact duplicates (good!)
├─ Coverage Efficiency: 0.0/100 ← Many subset tests (needs review)
└─ Uniqueness Score: 99.7/100 ← Tests are mostly unique (good!)
This score indicates:
- ✅ Very few exact duplicate tests (3 groups)
⚠️ 750 subset duplicates - many tests are subsets of others- ✅ High uniqueness - tests have different coverage patterns
A subset duplicate is a test whose coverage is completely contained within another test's coverage.
{
"test_short": {
"auth.py": [10, 11, 12]
},
"test_comprehensive": {
"auth.py": [10, 11, 12, 15, 20, 25],
"user.py": [5, 6, 7]
}
}test_short is a subset of test_comprehensive - every line it covers is also covered by the comprehensive test.
Not always! Consider:
✅ Keep the subset test if:
- It tests different behavior/edge cases
- It tests different assertions/validations
- It's faster and provides quick feedback
- It has different inputs that happen to execute same code
❌ Remove the subset test if:
- It's truly redundant (same inputs, same assertions)
- It was created by copy-paste without adding value
- It adds execution time with no benefit
TestIQ identifies tests with identical coverage (they execute the exact same lines of code). This can mean:
Tests that:
- Were copy-pasted with minor changes
- Test the same scenario with same inputs
- Add no unique value to test suite
- Can be consolidated or removed
Tests that:
- Have same coverage but different assertions
- Test different input values (same code path)
- Focus on behavior vs. code coverage
- Exercise the same code with different expected outcomes
def test_score_initialization():
"""Test creating a quality score."""
score = TestQualityScore(
overall_score=85.0,
duplication_score=90.0,
coverage_efficiency_score=80.0,
uniqueness_score=85.0,
grade="B+",
)
assert score.overall_score == 85.0
def test_score_perfect():
"""Test perfect quality score."""
score = TestQualityScore(
overall_score=100.0,
duplication_score=100.0,
coverage_efficiency_score=100.0,
uniqueness_score=100.0,
grade="A+",
)
assert score.overall_score == 100.0Coverage: Both execute the same import and dataclass creation code
Value: Different - one tests general initialization, another tests perfect scores
Action: Keep both - they test different scenarios
TestIQ provides prioritized recommendations:
- Exact duplicate test groups
- Critical coverage inefficiencies
- Action: Review immediately
- Subset tests that may be redundant
- Similar test pairs
- Action: Review when you have time
- Minor optimizations
- Refactoring suggestions
- Action: Consider during regular refactoring
- Start with exact duplicates - These are most likely to be true duplicates
- Check test intent, not just coverage - Different assertions = different value
- Review subset tests carefully - Many are intentional and valuable
- Consider test execution time - Slow duplicates are higher priority
- Use the HTML report - Visual inspection helps identify patterns
- Look for patterns - Multiple related tests with same coverage may indicate structural issue
For comprehensive results, run coverage and TestIQ separately:
# Recommended: Use make target
make test-complete
# Or run manually
pytest --cov=testiq --cov-report=term --cov-report=html
pytest --testiq-output=testiq_coverage.json -q
testiq analyze testiq_coverage.json --format html --output reports/duplicates.html
testiq quality-score testiq_coverage.jsonPython's sys.settrace() allows only ONE active tracer at a time:
- Running both together: 19% coverage (both corrupted)
- Running separately: 81% coverage (both complete)
Each tracer needs exclusive access for accurate data.
- Grade F (0-60): Significant duplication issues - review immediately
- Grade D (60-70): Many subset duplicates - review when possible
- Exact duplicates > 10: Likely copy-paste issues - consolidate
- Subset duplicates > 50%: Review test organization
- Track quality score trend
- Set CI/CD quality gates
- Use baselines to prevent regression
- Run analysis:
make test-dup - Review score: Check overall grade and components
- Open HTML report:
open reports/duplicates.html - Check exact duplicates: Review each group for true duplicates
- Review subset duplicates: Check if tests add unique value
- Take action: Remove/consolidate redundant tests
- Re-run analysis: Verify improvements
- Set baseline:
testiq baseline save current
A: Grade D (60-70) typically indicates many subset duplicates. This doesn't mean your tests are bad - it means many tests' coverage is contained within other tests. Review if this is intentional.
A: If tests execute the same code paths, TestIQ will flag them. Check if they test different behaviors - if so, they're false positives and should be kept.
A: No! Many subset tests are valuable - they may test edge cases, have different assertions, or provide faster feedback. Review each case individually.
A: Review subset duplicates and either:
- Remove truly redundant ones
- Ensure each test covers unique code paths
- Refactor tests to reduce overlap
A: This often happens when:
- Tests share common setup/teardown code
- Multiple tests exercise the same imports
- Tests have hierarchical coverage (unit → integration → e2e)
Most are likely intentional and valuable.
- Quality score is a guide, not an absolute metric
- False positives are expected - coverage ≠ behavior
- Focus on high-priority items - exact duplicates first
- Consider test intent - same coverage, different value is OK
- Use comprehensive analysis - run coverage and TestIQ separately
- Monitor trends - track improvements over time
TestIQ helps identify potential issues - your judgment determines what's truly redundant.