Skip to content

Latest commit

 

History

History
255 lines (182 loc) · 7.64 KB

File metadata and controls

255 lines (182 loc) · 7.64 KB

Interpreting TestIQ Results

Understanding Your Quality Score

TestIQ provides a comprehensive quality score (0-100) with letter grade (A+ to F) based on three components:

Score Components

  1. Duplication Score (50%) - Penalizes exact duplicate tests

    • 100 = No exact duplicates
    • Decreases by 2 points per 1% of duplicate tests
  2. Coverage Efficiency Score (30%) - Penalizes subset duplicates

    • 100 = No subset tests (every test covers unique lines)
    • Decreases by 1 point per 1% of subset tests
    • Subset test: A test whose coverage is completely contained in another test
  3. Uniqueness Score (20%) - Penalizes similar tests

    • 100 = All tests are unique
    • Decreases based on similarity threshold matches

Example Score Breakdown

Overall Score: 68.8/100 (Grade: D)
├─ Duplication Score: 97.7/100     ← Few exact duplicates (good!)
├─ Coverage Efficiency: 0.0/100    ← Many subset tests (needs review)
└─ Uniqueness Score: 99.7/100      ← Tests are mostly unique (good!)

This score indicates:

  • ✅ Very few exact duplicate tests (3 groups)
  • ⚠️ 750 subset duplicates - many tests are subsets of others
  • ✅ High uniqueness - tests have different coverage patterns

What Are Subset Duplicates?

A subset duplicate is a test whose coverage is completely contained within another test's coverage.

Example

{
  "test_short": {
    "auth.py": [10, 11, 12]
  },
  "test_comprehensive": {
    "auth.py": [10, 11, 12, 15, 20, 25],
    "user.py": [5, 6, 7]
  }
}

test_short is a subset of test_comprehensive - every line it covers is also covered by the comprehensive test.

Should You Remove Subset Tests?

Not always! Consider:

Keep the subset test if:

  • It tests different behavior/edge cases
  • It tests different assertions/validations
  • It's faster and provides quick feedback
  • It has different inputs that happen to execute same code

Remove the subset test if:

  • It's truly redundant (same inputs, same assertions)
  • It was created by copy-paste without adding value
  • It adds execution time with no benefit

Understanding Duplicate Groups

TestIQ identifies tests with identical coverage (they execute the exact same lines of code). This can mean:

True Duplicates ✅ (Should Review)

Tests that:

  • Were copy-pasted with minor changes
  • Test the same scenario with same inputs
  • Add no unique value to test suite
  • Can be consolidated or removed

False Positives ⚠️ (Expected)

Tests that:

  • Have same coverage but different assertions
  • Test different input values (same code path)
  • Focus on behavior vs. code coverage
  • Exercise the same code with different expected outcomes

Example: False Positive

def test_score_initialization():
    """Test creating a quality score."""
    score = TestQualityScore(
        overall_score=85.0,
        duplication_score=90.0,
        coverage_efficiency_score=80.0,
        uniqueness_score=85.0,
        grade="B+",
    )
    assert score.overall_score == 85.0

def test_score_perfect():
    """Test perfect quality score."""
    score = TestQualityScore(
        overall_score=100.0,
        duplication_score=100.0,
        coverage_efficiency_score=100.0,
        uniqueness_score=100.0,
        grade="A+",
    )
    assert score.overall_score == 100.0

Coverage: Both execute the same import and dataclass creation code
Value: Different - one tests general initialization, another tests perfect scores
Action: Keep both - they test different scenarios


Interpreting Recommendations

TestIQ provides prioritized recommendations:

High Priority (🔴)

  • Exact duplicate test groups
  • Critical coverage inefficiencies
  • Action: Review immediately

Medium Priority (🟡)

  • Subset tests that may be redundant
  • Similar test pairs
  • Action: Review when you have time

Low Priority (🟢)

  • Minor optimizations
  • Refactoring suggestions
  • Action: Consider during regular refactoring

Best Practices for Review

  1. Start with exact duplicates - These are most likely to be true duplicates
  2. Check test intent, not just coverage - Different assertions = different value
  3. Review subset tests carefully - Many are intentional and valuable
  4. Consider test execution time - Slow duplicates are higher priority
  5. Use the HTML report - Visual inspection helps identify patterns
  6. Look for patterns - Multiple related tests with same coverage may indicate structural issue

Running Complete Analysis

For comprehensive results, run coverage and TestIQ separately:

# Recommended: Use make target
make test-complete

# Or run manually
pytest --cov=testiq --cov-report=term --cov-report=html
pytest --testiq-output=testiq_coverage.json -q
testiq analyze testiq_coverage.json --format html --output reports/duplicates.html
testiq quality-score testiq_coverage.json

Why Separate Runs?

Python's sys.settrace() allows only ONE active tracer at a time:

  • Running both together: 19% coverage (both corrupted)
  • Running separately: 81% coverage (both complete)

Each tracer needs exclusive access for accurate data.


When to Act on Results

High Priority Actions

  • Grade F (0-60): Significant duplication issues - review immediately
  • Grade D (60-70): Many subset duplicates - review when possible
  • Exact duplicates > 10: Likely copy-paste issues - consolidate
  • Subset duplicates > 50%: Review test organization

Monitor Over Time

  • Track quality score trend
  • Set CI/CD quality gates
  • Use baselines to prevent regression

Example Workflow

  1. Run analysis: make test-dup
  2. Review score: Check overall grade and components
  3. Open HTML report: open reports/duplicates.html
  4. Check exact duplicates: Review each group for true duplicates
  5. Review subset duplicates: Check if tests add unique value
  6. Take action: Remove/consolidate redundant tests
  7. Re-run analysis: Verify improvements
  8. Set baseline: testiq baseline save current

FAQ

Q: Why does my test suite have a D grade?

A: Grade D (60-70) typically indicates many subset duplicates. This doesn't mean your tests are bad - it means many tests' coverage is contained within other tests. Review if this is intentional.

Q: Why are my identical-looking tests flagged as duplicates?

A: If tests execute the same code paths, TestIQ will flag them. Check if they test different behaviors - if so, they're false positives and should be kept.

Q: Should I remove all subset duplicates?

A: No! Many subset tests are valuable - they may test edge cases, have different assertions, or provide faster feedback. Review each case individually.

Q: How do I improve my efficiency score?

A: Review subset duplicates and either:

  • Remove truly redundant ones
  • Ensure each test covers unique code paths
  • Refactor tests to reduce overlap

Q: Why do I see 750 subset duplicates?

A: This often happens when:

  • Tests share common setup/teardown code
  • Multiple tests exercise the same imports
  • Tests have hierarchical coverage (unit → integration → e2e)

Most are likely intentional and valuable.


Summary

  • Quality score is a guide, not an absolute metric
  • False positives are expected - coverage ≠ behavior
  • Focus on high-priority items - exact duplicates first
  • Consider test intent - same coverage, different value is OK
  • Use comprehensive analysis - run coverage and TestIQ separately
  • Monitor trends - track improvements over time

TestIQ helps identify potential issues - your judgment determines what's truly redundant.