Interpreting TestIQ Results

Understanding Your Quality Score

TestIQ provides a comprehensive quality score (0-100) with letter grade (A+ to F) based on three components:

Score Components

Duplication Score (50%) - Penalizes exact duplicate tests
- 100 = No exact duplicates
- Decreases by 2 points per 1% of duplicate tests
Coverage Efficiency Score (30%) - Penalizes subset duplicates
- 100 = No subset tests (every test covers unique lines)
- Decreases by 1 point per 1% of subset tests
- Subset test: A test whose coverage is completely contained in another test
Uniqueness Score (20%) - Penalizes similar tests
- 100 = All tests are unique
- Decreases based on similarity threshold matches

Example Score Breakdown

Overall Score: 68.8/100 (Grade: D)
├─ Duplication Score: 97.7/100     ← Few exact duplicates (good!)
├─ Coverage Efficiency: 0.0/100    ← Many subset tests (needs review)
└─ Uniqueness Score: 99.7/100      ← Tests are mostly unique (good!)

This score indicates:

✅ Very few exact duplicate tests (3 groups)
⚠️ 750 subset duplicates - many tests are subsets of others
✅ High uniqueness - tests have different coverage patterns

What Are Subset Duplicates?

A subset duplicate is a test whose coverage is completely contained within another test's coverage.

Example

{
  "test_short": {
    "auth.py": [10, 11, 12]
  },
  "test_comprehensive": {
    "auth.py": [10, 11, 12, 15, 20, 25],
    "user.py": [5, 6, 7]
  }
}

test_short is a subset of test_comprehensive - every line it covers is also covered by the comprehensive test.

Should You Remove Subset Tests?

Not always! Consider:

✅ Keep the subset test if:

It tests different behavior/edge cases
It tests different assertions/validations
It's faster and provides quick feedback
It has different inputs that happen to execute same code

❌ Remove the subset test if:

It's truly redundant (same inputs, same assertions)
It was created by copy-paste without adding value
It adds execution time with no benefit

Understanding Duplicate Groups

TestIQ identifies tests with identical coverage (they execute the exact same lines of code). This can mean:

True Duplicates ✅ (Should Review)

Tests that:

Were copy-pasted with minor changes
Test the same scenario with same inputs
Add no unique value to test suite
Can be consolidated or removed

False Positives ⚠️ (Expected)

Tests that:

Have same coverage but different assertions
Test different input values (same code path)
Focus on behavior vs. code coverage
Exercise the same code with different expected outcomes

Example: False Positive

def test_score_initialization():
    """Test creating a quality score."""
    score = TestQualityScore(
        overall_score=85.0,
        duplication_score=90.0,
        coverage_efficiency_score=80.0,
        uniqueness_score=85.0,
        grade="B+",
    )
    assert score.overall_score == 85.0

def test_score_perfect():
    """Test perfect quality score."""
    score = TestQualityScore(
        overall_score=100.0,
        duplication_score=100.0,
        coverage_efficiency_score=100.0,
        uniqueness_score=100.0,
        grade="A+",
    )
    assert score.overall_score == 100.0

Coverage: Both execute the same import and dataclass creation code
Value: Different - one tests general initialization, another tests perfect scores
Action: Keep both - they test different scenarios

Interpreting Recommendations

TestIQ provides prioritized recommendations:

High Priority (🔴)

Exact duplicate test groups
Critical coverage inefficiencies
Action: Review immediately

Medium Priority (🟡)

Subset tests that may be redundant
Similar test pairs
Action: Review when you have time

Low Priority (🟢)

Minor optimizations
Refactoring suggestions
Action: Consider during regular refactoring

Best Practices for Review

Start with exact duplicates - These are most likely to be true duplicates
Check test intent, not just coverage - Different assertions = different value
Review subset tests carefully - Many are intentional and valuable
Consider test execution time - Slow duplicates are higher priority
Use the HTML report - Visual inspection helps identify patterns
Look for patterns - Multiple related tests with same coverage may indicate structural issue

Running Complete Analysis

For comprehensive results, run coverage and TestIQ separately:

# Recommended: Use make target
make test-complete

# Or run manually
pytest --cov=testiq --cov-report=term --cov-report=html
pytest --testiq-output=testiq_coverage.json -q
testiq analyze testiq_coverage.json --format html --output reports/duplicates.html
testiq quality-score testiq_coverage.json

Why Separate Runs?

Python's sys.settrace() allows only ONE active tracer at a time:

Running both together: 19% coverage (both corrupted)
Running separately: 81% coverage (both complete)

Each tracer needs exclusive access for accurate data.

When to Act on Results

High Priority Actions

Grade F (0-60): Significant duplication issues - review immediately
Grade D (60-70): Many subset duplicates - review when possible
Exact duplicates > 10: Likely copy-paste issues - consolidate
Subset duplicates > 50%: Review test organization

Monitor Over Time

Track quality score trend
Set CI/CD quality gates
Use baselines to prevent regression

Example Workflow

Run analysis: make test-dup
Review score: Check overall grade and components
Open HTML report: open reports/duplicates.html
Check exact duplicates: Review each group for true duplicates
Review subset duplicates: Check if tests add unique value
Take action: Remove/consolidate redundant tests
Re-run analysis: Verify improvements
Set baseline: testiq baseline save current

FAQ

Q: Why does my test suite have a D grade?

A: Grade D (60-70) typically indicates many subset duplicates. This doesn't mean your tests are bad - it means many tests' coverage is contained within other tests. Review if this is intentional.

Q: Why are my identical-looking tests flagged as duplicates?

A: If tests execute the same code paths, TestIQ will flag them. Check if they test different behaviors - if so, they're false positives and should be kept.

Q: Should I remove all subset duplicates?

A: No! Many subset tests are valuable - they may test edge cases, have different assertions, or provide faster feedback. Review each case individually.

Q: How do I improve my efficiency score?

A: Review subset duplicates and either:

Remove truly redundant ones
Ensure each test covers unique code paths
Refactor tests to reduce overlap

Q: Why do I see 750 subset duplicates?

A: This often happens when:

Tests share common setup/teardown code
Multiple tests exercise the same imports
Tests have hierarchical coverage (unit → integration → e2e)

Most are likely intentional and valuable.

Summary

Quality score is a guide, not an absolute metric
False positives are expected - coverage ≠ behavior
Focus on high-priority items - exact duplicates first
Consider test intent - same coverage, different value is OK
Use comprehensive analysis - run coverage and TestIQ separately
Monitor trends - track improvements over time

TestIQ helps identify potential issues - your judgment determines what's truly redundant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpreting TestIQ Results

Understanding Your Quality Score

Score Components

Example Score Breakdown

What Are Subset Duplicates?

Example

Should You Remove Subset Tests?

Understanding Duplicate Groups

True Duplicates ✅ (Should Review)

False Positives ⚠️ (Expected)

Example: False Positive

Interpreting Recommendations

High Priority (🔴)

Medium Priority (🟡)

Low Priority (🟢)

Best Practices for Review

Running Complete Analysis

Why Separate Runs?

When to Act on Results

High Priority Actions

Monitor Over Time

Example Workflow

FAQ

Q: Why does my test suite have a D grade?

Q: Why are my identical-looking tests flagged as duplicates?

Q: Should I remove all subset duplicates?

Q: How do I improve my efficiency score?

Q: Why do I see 750 subset duplicates?

Summary

FilesExpand file tree

interpreting-results.md

Latest commit

History

interpreting-results.md

File metadata and controls

Interpreting TestIQ Results

Understanding Your Quality Score

Score Components

Example Score Breakdown

What Are Subset Duplicates?

Example

Should You Remove Subset Tests?

Understanding Duplicate Groups

True Duplicates ✅ (Should Review)

False Positives ⚠️ (Expected)

Example: False Positive

Interpreting Recommendations

High Priority (🔴)

Medium Priority (🟡)

Low Priority (🟢)

Best Practices for Review

Running Complete Analysis

Why Separate Runs?

When to Act on Results

High Priority Actions

Monitor Over Time

Example Workflow

FAQ

Q: Why does my test suite have a D grade?

Q: Why are my identical-looking tests flagged as duplicates?

Q: Should I remove all subset duplicates?

Q: How do I improve my efficiency score?

Q: Why do I see 750 subset duplicates?

Summary