Skip to content

Improve test skills#428

Merged
Evangelink merged 4 commits intomainfrom
dev/amauryleve/test-skills-improve
Mar 30, 2026
Merged

Improve test skills#428
Evangelink merged 4 commits intomainfrom
dev/amauryleve/test-skills-improve

Conversation

@Evangelink
Copy link
Copy Markdown
Member

Use skill validation tools to understand what can be improved on test skills

Copilot AI review requested due to automatic review settings March 23, 2026 18:18
@Evangelink
Copy link
Copy Markdown
Member Author

/evaluate

@github-actions
Copy link
Copy Markdown
Contributor

Skill Validation Results

Skill Scenario Quality (Isolated) Quality (Plugin) Skills Loaded Overfit Verdict
migrate-mstest-v3-to-v4 Migrate custom TestMethodAttribute from Execute to ExecuteAsync 2.0/5 → 3.7/5 🟢 2.0/5 → 3.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill, read_bash / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.09
migrate-mstest-v3-to-v4 Replace ExpectedExceptionAttribute with Assert.ThrowsExactly 3.0/5 → 3.3/5 🟢 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.09 [1]
migrate-mstest-v3-to-v4 Fix multiple v4 breaking changes: Assert, ClassCleanup, TestContext, Timeout 3.0/5 ⏰ → 5.0/5 🟢 3.0/5 ⏰ → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.09
migrate-mstest-v3-to-v4 Handle net6.0 target framework dropped in MSTest v4 3.3/5 → 4.7/5 🟢 3.3/5 → 4.3/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.09
migrate-mstest-v3-to-v4 Fix TestMethodAttribute CallerInfo constructor breaking change 4.0/5 → 4.7/5 🟢 4.0/5 → 4.7/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.09 [2]
migrate-mstest-v3-to-v4 Understand behavioral changes after MSTest v4 upgrade 3.0/5 → 5.0/5 🟢 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.09
migrate-mstest-v3-to-v4 Handle MSTest.Sdk and MTP changes in v4 2.3/5 → 3.0/5 🟢 2.3/5 → 3.3/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.09
migrate-mstest-v3-to-v4 Full MSTest v3 to v4 migration with multiple breaking changes 3.7/5 → 5.0/5 🟢 3.7/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.09
migrate-mstest-v3-to-v4 Migrate MSTest.Sdk v3 project using ManagedType and TestTimeout 1.7/5 ⏰ → 4.0/5 🟢 1.7/5 ⏰ → 4.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.09
migrate-mstest-v3-to-v4 Correctly identify MSTest v3 project and recommend v4 migration 4.3/5 → 5.0/5 🟢 4.3/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.09
migrate-mstest-v1v2-to-v3 Migrate MSTest v1 project with assembly reference 2.3/5 → 4.7/5 🟢 2.3/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit, bash / ✅ migrate-mstest-v1v2-to-v3; tools: skill, bash ✅ 0.04
migrate-mstest-v1v2-to-v3 Migrate MSTest v2 NuGet project to v3 3.0/5 → 3.0/5 3.0/5 → 3.0/5 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04 [3]
migrate-mstest-v1v2-to-v3 Fix Assert.AreEqual object overload errors after v3 upgrade 2.7/5 → 5.0/5 🟢 2.7/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit / ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit ✅ 0.04
migrate-mstest-v1v2-to-v3 Migrate from .testsettings to .runsettings 3.7/5 → 4.0/5 🟢 3.7/5 → 4.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, bash / ✅ migrate-mstest-v1v2-to-v3; tools: skill, bash ✅ 0.04 [4]
migrate-mstest-v1v2-to-v3 Fix DataRow type mismatch errors after v3 upgrade 3.0/5 → 3.0/5 3.0/5 → 3.3/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04 [5]
migrate-mstest-v1v2-to-v3 Migrate to MSTest.Sdk project style 3.3/5 → 4.3/5 🟢 3.3/5 → 4.3/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04
migrate-mstest-v1v2-to-v3 Handle dropped target framework during v3 migration 4.3/5 → 4.0/5 🔴 4.3/5 → 4.7/5 🟢 ⚠️ NOT ACTIVATED / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04
migrate-mstest-v1v2-to-v3 Migrate complex MSTest v2 project with testsettings, DataRow issues, and dropped TFM 3.0/5 → 5.0/5 🟢 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04
migrate-mstest-v1v2-to-v3 Correctly identify MSTest v1 vs v2 and recommend different migration paths 3.7/5 → 4.7/5 🟢 3.7/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, task, glob, bash / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04
resolve-project-references Explain misleading ResolveProjectReferences time 3.3/5 → 5.0/5 🟢 3.3/5 → 5.0/5 🟢 ✅ resolve-project-references; tools: skill / ✅ resolve-project-references; tools: skill ✅ 0.14
writing-mstest-tests Write unit tests for a service class 4.7/5 → 5.0/5 🟢 4.7/5 → 4.7/5 ⏰ ✅ writing-mstest-tests; tools: skill, glob / ✅ writing-mstest-tests; tools: skill, task, glob, grep 🟡 0.29 [6]
writing-mstest-tests Write data-driven tests for a calculator 4.7/5 → 4.7/5 4.7/5 → 5.0/5 🟢 ✅ writing-mstest-tests; tools: skill, glob / ✅ writing-mstest-tests; tools: skill 🟡 0.29 [7]
writing-mstest-tests Write async tests with cancellation 2.3/5 → 5.0/5 🟢 2.3/5 → 5.0/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.29
writing-mstest-tests Fix swapped Assert.AreEqual arguments 5.0/5 → 5.0/5 5.0/5 → 5.0/5 ✅ writing-mstest-tests; tools: report_intent, skill / ✅ writing-mstest-tests; tools: skill 🟡 0.29 [8]
writing-mstest-tests Modernize legacy test patterns 4.0/5 ⏰ → 4.3/5 ⏰ 🟢 4.0/5 ⏰ → 5.0/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.29 [9]
writing-mstest-tests Replace ExpectedException with Assert.Throws 3.0/5 → 4.3/5 🟢 3.0/5 → 5.0/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.29
writing-mstest-tests Use proper collection assertions 3.3/5 → 3.3/5 3.3/5 → 2.7/5 🔴 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.29
writing-mstest-tests Use proper type assertions instead of casts 3.0/5 → 3.0/5 3.0/5 → 4.7/5 🟢 ⚠️ NOT ACTIVATED / ✅ writing-mstest-tests; tools: skill 🟡 0.29
writing-mstest-tests Set up test lifecycle correctly 2.3/5 → 4.0/5 🟢 2.3/5 → 4.0/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.29
writing-mstest-tests Use DynamicData with ValueTuples over object arrays 2.0/5 → 3.7/5 🟢 2.0/5 → 3.7/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.29
exp-test-anti-patterns Detect mixed severity anti-patterns in repository service tests 5.0/5 → 5.0/5 5.0/5 → 5.0/5 ✅ exp-test-anti-patterns; tools: report_intent, skill / ✅ exp-test-anti-patterns; tools: skill, report_intent ✅ 0.08 [10]
exp-test-anti-patterns Detect flakiness indicators and test coupling 3.0/5 → 3.3/5 🟢 3.0/5 → 4.7/5 🟢 ✅ exp-test-anti-patterns; tools: report_intent, skill / ✅ exp-test-anti-patterns; tools: report_intent, skill ✅ 0.08
exp-test-anti-patterns Detect duplicated tests and magic values 3.0/5 → 5.0/5 🟢 3.0/5 → 5.0/5 🟢 ✅ exp-test-anti-patterns; tools: report_intent, skill / ✅ exp-test-anti-patterns; tools: skill, report_intent ✅ 0.08
exp-test-anti-patterns Recognize well-written tests without inventing false positives 2.0/5 → 5.0/5 🟢 2.0/5 → 5.0/5 🟢 ✅ exp-test-anti-patterns; tools: report_intent, skill / ✅ exp-test-anti-patterns; tools: report_intent, skill ✅ 0.08
mtp-hot-reload Suggest hot reload for failing test in MTP project (SDK 9) 1.0/5 → 4.3/5 ⏰ 🟢 1.0/5 → 3.0/5 ⏰ 🟢 ✅ mtp-hot-reload; tools: skill / ✅ mtp-hot-reload; tools: skill, stop_bash ✅ 0.11
mtp-hot-reload Suggest hot reload for failing test in MTP project (SDK 10) 1.0/5 → 3.7/5 🟢 1.0/5 → 4.3/5 🟢 ✅ mtp-hot-reload; tools: skill, glob, bash, create / ✅ mtp-hot-reload; tools: skill, bash, create ✅ 0.11
mtp-hot-reload Enable hot reload when package already installed 2.0/5 → 5.0/5 🟢 2.0/5 → 5.0/5 🟢 ✅ mtp-hot-reload; tools: skill, bash, glob / ✅ mtp-hot-reload; tools: skill ✅ 0.11
mtp-hot-reload Suggest launchSettings.json configuration for hot reload 1.0/5 → 4.3/5 🟢 1.0/5 → 5.0/5 🟢 ✅ mtp-hot-reload; tools: skill, bash, create / ✅ mtp-hot-reload; tools: skill, glob, bash, create ✅ 0.11
mtp-hot-reload Use dotnet run not dotnet test for hot reload 2.3/5 → 3.0/5 🟢 2.3/5 → 3.3/5 🟢 ✅ mtp-hot-reload; tools: skill / ✅ mtp-hot-reload; tools: report_intent, skill ✅ 0.11
mtp-hot-reload Negative: VSTest project cannot use MTP hot reload 1.0/5 → 3.0/5 ⏰ 🟢 1.0/5 → 3.3/5 🟢 ✅ mtp-hot-reload; tools: skill, create, edit, glob / ✅ mtp-hot-reload; tools: skill, edit, create ✅ 0.11
mtp-hot-reload Run specific failing test with hot reload filter 1.0/5 → 3.3/5 🟢 1.0/5 → 3.7/5 🟢 ✅ mtp-hot-reload; tools: report_intent, skill, view / ✅ mtp-hot-reload; tools: report_intent, skill, view ✅ 0.11
migrate-vstest-to-mtp Migrate MSTest project from VSTest to Microsoft.Testing.Platform 4.3/5 → 5.0/5 🟢 4.3/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: report_intent, skill, bash, view ✅ 0.10
migrate-vstest-to-mtp Migrate NUnit project from VSTest to Microsoft.Testing.Platform 1.7/5 → 5.0/5 🟢 1.7/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: report_intent, skill / ✅ migrate-vstest-to-mtp; tools: skill ✅ 0.10
migrate-vstest-to-mtp Migrate xUnit.net v2 project from VSTest to Microsoft.Testing.Platform 1.0/5 → 5.0/5 🟢 1.0/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill, report_intent, view, bash / ✅ migrate-vstest-to-mtp; tools: skill, report_intent, view, bash ✅ 0.10
migrate-vstest-to-mtp Update Azure DevOps pipeline from VSTest task to MTP 2.7/5 → 5.0/5 🟢 2.7/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill ✅ 0.10
migrate-vstest-to-mtp Migrate MSTest.Sdk project that explicitly uses VSTest 2.7/5 → 5.0/5 🟢 2.7/5 → 4.7/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill ✅ 0.10
migrate-vstest-to-mtp Translate dotnet test VSTest arguments to MTP equivalents 4.3/5 → 5.0/5 🟢 4.3/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: report_intent, skill / ✅ migrate-vstest-to-mtp; tools: report_intent, skill ✅ 0.10
migrate-vstest-to-mtp Handle exit code 8 when migrating from VSTest to MTP 2.7/5 → 4.7/5 🟢 2.7/5 → 4.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill ✅ 0.10 [11]
migrate-vstest-to-mtp Configure dotnet test MTP mode on .NET 10 SDK 2.0/5 → 5.0/5 🟢 2.0/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill ✅ 0.10
migrate-vstest-to-mtp Migrate xUnit.net VSTest filter syntax to MTP 2.0/5 → 3.7/5 🟢 2.0/5 → 4.3/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill ✅ 0.10
migrate-vstest-to-mtp Full VSTest to MTP migration plan for MSTest solution 3.0/5 → 5.0/5 🟢 3.0/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill, create ✅ 0.10
exp-test-tagging Tag an untagged MSTest test suite 3.3/5 → 5.0/5 🟢 3.3/5 → 5.0/5 🟢 ✅ exp-test-tagging; tools: skill, glob / ✅ exp-test-tagging; tools: skill, glob 🟡 0.31
exp-test-tagging Tag an untagged xUnit test suite 4.0/5 → 5.0/5 🟢 4.0/5 → 5.0/5 🟢 ✅ exp-test-tagging; tools: skill, glob / ✅ exp-test-tagging; tools: skill 🟡 0.31
exp-test-tagging Tag an untagged NUnit test suite 3.7/5 → 4.7/5 🟢 3.7/5 → 4.3/5 🟢 ✅ exp-test-tagging; tools: skill, glob / ✅ exp-test-tagging; tools: skill, task, glob 🟡 0.31
exp-test-tagging Audit test distribution without modifying files 3.7/5 ⏰ → 4.7/5 🟢 3.7/5 ⏰ → 5.0/5 🟢 ✅ exp-test-tagging; tools: skill / ✅ exp-test-tagging; tools: skill 🟡 0.31
exp-test-tagging Decline request to write new tests 4.0/5 → 4.0/5 4.0/5 → 4.0/5 ℹ️ not activated (expected) / ℹ️ not activated (expected) 🟡 0.31 [12]
msbuild-server Recommend MSBuild Server for slow CLI incremental builds 3.0/5 → 4.3/5 🟢 3.0/5 → 4.3/5 🟢 ✅ msbuild-server; tools: skill / ✅ msbuild-server; tools: skill, bash ✅ 0.17
exp-crap-score Calculate CRAP score for a single method with partial coverage 3.7/5 → 3.7/5 3.7/5 → 5.0/5 🟢 ✅ exp-crap-score; tools: skill, bash / ✅ exp-crap-score; tools: skill, bash ✅ 0.10
exp-crap-score Identify riskiest methods across a file 4.3/5 → 5.0/5 🟢 4.3/5 → 5.0/5 🟢 ✅ exp-crap-score; tools: skill, glob / ✅ exp-crap-score; tools: skill, glob ✅ 0.10
exp-crap-score Generate coverage then compute CRAP score 4.0/5 → 3.7/5 🔴 4.0/5 → 4.0/5 ✅ exp-crap-score; tools: skill / ✅ exp-crap-score; tools: skill ✅ 0.10
run-tests Run tests in a VSTest MSTest project 4.0/5 → 5.0/5 🟢 4.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill / ✅ run-tests; tools: skill ✅ 0.18
run-tests Run tests with trx reporting on MTP project (SDK 9) 4.0/5 → 3.3/5 ⏰ 🔴 4.0/5 → 2.7/5 ⏰ 🔴 ✅ run-tests; tools: skill / ✅ run-tests; tools: skill ✅ 0.18
run-tests Run tests with blame-hang on MTP project (SDK 10) 2.0/5 ⏰ → 2.0/5 ⏰ 2.0/5 ⏰ → 2.3/5 ⏰ 🟢 ✅ run-tests; tools: skill, bash, edit / ✅ run-tests; tools: skill, bash, edit ✅ 0.18 [13]
run-tests Run tests in a multi-TFM project targeting a specific framework 2.0/5 → 4.3/5 🟢 2.0/5 → 4.0/5 🟢 ✅ run-tests; tools: skill, bash, glob / ✅ run-tests; tools: skill, bash ✅ 0.18
run-tests Filter MSTest tests by category on VSTest 5.0/5 → 5.0/5 5.0/5 → 5.0/5 ✅ run-tests; tools: skill, bash, glob / ✅ run-tests; tools: skill, bash ✅ 0.18
run-tests Filter NUnit tests by class name on VSTest 3.7/5 → 4.7/5 🟢 3.7/5 → 5.0/5 🟢 ✅ run-tests; tools: skill, bash, view / ✅ run-tests; tools: skill, view, bash ✅ 0.18
run-tests Filter xUnit v3 tests by class on MTP 1.0/5 → 5.0/5 🟢 1.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill, bash ✅ 0.18
run-tests Filter xUnit v3 tests by trait on MTP 1.0/5 → 5.0/5 🟢 1.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill, view / ✅ run-tests; tools: skill, view ✅ 0.18
run-tests Filter TUnit tests by class using treenode-filter 1.7/5 → 4.7/5 🟢 1.7/5 → 4.7/5 🟢 ✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill, bash ✅ 0.18
run-tests Combine multiple filter criteria on VSTest MSTest 3.7/5 → 4.0/5 🟢 3.7/5 → 3.7/5 ✅ run-tests; tools: skill, bash, glob / ✅ run-tests; tools: skill, bash, glob ✅ 0.18
run-tests MTP project on SDK 9 must use -- separator for args 1.0/5 → 5.0/5 🟢 1.0/5 → 3.7/5 🟢 ✅ run-tests; tools: skill / ✅ run-tests; tools: skill, edit ✅ 0.18
run-tests MTP project on SDK 10 passes args directly 2.7/5 → 3.3/5 🟢 2.7/5 → 4.0/5 🟢 ✅ run-tests; tools: skill / ✅ run-tests; tools: skill ✅ 0.18
run-tests Detect test platform from Directory.Build.props 1.0/5 → 5.0/5 🟢 1.0/5 → 3.0/5 🟢 ✅ run-tests; tools: skill / ✅ run-tests; tools: skill ✅ 0.18
run-tests Negative test: do not use MTP syntax for a VSTest project 4.0/5 → 5.0/5 🟢 4.0/5 → 4.3/5 🟢 ✅ run-tests; tools: skill, view / ✅ run-tests; tools: skill, view ✅ 0.18

[1] (Isolated) Quality improved but weighted score is -4.5% due to: judgment
[2] (Plugin) Quality improved but weighted score is -14.5% due to: judgment, quality
[3] (Plugin) Quality unchanged but weighted score is -14.2% due to: completion (✓ → ✗), quality
[4] (Isolated) Quality improved but weighted score is -8.8% due to: tokens (37336 → 67775), quality, time (28.9s → 37.1s)
[5] (Isolated) Quality unchanged but weighted score is -2.1% due to: tokens (90972 → 127085)
[6] (Plugin) Quality unchanged but weighted score is -0.3% due to: tokens (215221 → 278711), tool calls (19 → 28)
[7] (Isolated) Quality unchanged but weighted score is -29.2% due to: judgment
[8] (Plugin) Quality unchanged but weighted score is -6.8% due to: tokens (11655 → 31141), tool calls (0 → 1), time (14.3s → 19.3s)
[9] (Isolated) Quality improved but weighted score is -27.7% due to: judgment, quality, errors (0 → 1), tokens (203610 → 261764), time (127.4s → 168.2s), tool calls (15 → 18)
[10] (Plugin) Quality unchanged but weighted score is -8.3% due to: tokens (13412 → 37130), tool calls (0 → 2), time (25.7s → 50.7s)
[11] (Plugin) Quality improved but weighted score is -10.2% due to: judgment, quality
[12] (Isolated) Quality unchanged but weighted score is -37.4% due to: quality, judgment, tokens (38961 → 107977), tool calls (4 → 8), time (47.6s → 84.4s)
[13] (Plugin) Quality improved but weighted score is -5.8% due to: tokens (166085 → 494516), errors (0 → 1), tool calls (13 → 26), time (121.4s → 260.9s)

timeout — run hit the scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output

Model: claude-opus-4.6 | Judge: claude-opus-4.6

Full results

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Evangelink Evangelink force-pushed the dev/amauryleve/test-skills-improve branch from 3ffe052 to c04a728 Compare March 30, 2026 07:37
@Evangelink
Copy link
Copy Markdown
Member Author

/evaluate

@github-actions
Copy link
Copy Markdown
Contributor

Skill Validation Results

Skill Scenario Quality Skills Loaded Overfit Verdict
test-anti-patterns Detect mixed severity anti-patterns in repository service tests 5.0/5 → 5.0/5 ✅ test-anti-patterns; tools: report_intent, skill / ✅ test-anti-patterns; tools: skill, report_intent ✅ 0.07 [1]
test-anti-patterns Detect flakiness indicators and test coupling 3.0/5 → 4.3/5 🟢 ✅ test-anti-patterns; tools: report_intent, skill / ✅ test-anti-patterns; tools: report_intent, skill ✅ 0.07
test-anti-patterns Detect duplicated tests and magic values 3.0/5 → 5.0/5 🟢 ✅ test-anti-patterns; tools: skill, report_intent / ✅ writing-mstest-tests; test-anti-patterns; tools: skill, report_intent ✅ 0.07
test-anti-patterns Recognize well-written tests without inventing false positives 2.0/5 → 5.0/5 🟢 ✅ test-anti-patterns; tools: report_intent, skill / ✅ test-anti-patterns; tools: report_intent, skill ✅ 0.07
migrate-mstest-v3-to-v4 Migrate custom TestMethodAttribute from Execute to ExecuteAsync 1.7/5 → 3.7/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.06
migrate-mstest-v3-to-v4 Replace ExpectedExceptionAttribute with Assert.ThrowsExactly 3.3/5 → 3.0/5 🔴 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.06 [2]
migrate-mstest-v3-to-v4 Fix multiple v4 breaking changes: Assert, ClassCleanup, TestContext, Timeout 3.0/5 ⏰ → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.06
migrate-mstest-v3-to-v4 Handle net6.0 target framework dropped in MSTest v4 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ⚠️ NOT ACTIVATED ✅ 0.06
migrate-mstest-v3-to-v4 Fix TestMethodAttribute CallerInfo constructor breaking change 4.0/5 → 4.7/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.06 [3]
migrate-mstest-v3-to-v4 Understand behavioral changes after MSTest v4 upgrade 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.06
migrate-mstest-v3-to-v4 Handle MSTest.Sdk and MTP changes in v4 2.0/5 → 3.7/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill, report_intent ✅ 0.06
migrate-mstest-v3-to-v4 Full MSTest v3 to v4 migration with multiple breaking changes 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.06
migrate-mstest-v3-to-v4 Migrate MSTest.Sdk v3 project using ManagedType and TestTimeout 3.0/5 ⏰ → 4.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.06
migrate-mstest-v3-to-v4 Correctly identify MSTest v3 project and recommend v4 migration 4.7/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.06
writing-mstest-tests Write unit tests for a service class 4.3/5 → 4.3/5 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27 [4]
writing-mstest-tests Write data-driven tests for a calculator 4.7/5 → 4.7/5 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27
writing-mstest-tests Write async tests with cancellation 2.0/5 → 5.0/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27
writing-mstest-tests Fix swapped Assert.AreEqual arguments 5.0/5 → 5.0/5 ✅ writing-mstest-tests; tools: skill, report_intent / ✅ writing-mstest-tests; tools: skill 🟡 0.27 [5]
writing-mstest-tests Modernize legacy test patterns 3.3/5 ⏰ → 5.0/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27
writing-mstest-tests Replace ExpectedException with Assert.Throws 3.0/5 → 5.0/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: report_intent, skill 🟡 0.27
writing-mstest-tests Use proper collection assertions 3.0/5 → 3.7/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27
writing-mstest-tests Use proper type assertions instead of casts 3.3/5 → 3.7/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27
writing-mstest-tests Set up test lifecycle correctly 1.7/5 → 4.3/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27
writing-mstest-tests Use DynamicData with ValueTuples over object arrays 2.3/5 → 3.7/5 🟢 ✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED 🟡 0.27
msbuild-server Recommend MSBuild Server for slow CLI incremental builds 3.0/5 → 4.7/5 🟢 ✅ msbuild-server; tools: skill / ✅ msbuild-server; tools: skill, bash 🟡 0.29
convert-to-cpm Decline CPM conversion for packages.config project 1.0/5 → 2.0/5 🟢 ℹ️ not activated (expected) / ℹ️ not activated (expected) ✅ 0.16 [6]
convert-to-cpm Recommend CPM when updating packages with version conflicts 2.0/5 → 2.7/5 🟢 ✅ convert-to-cpm; tools: skill, glob, bash, read_bash, create, task, read_agent / ✅ convert-to-cpm; tools: skill, glob, bash, read_bash, create ✅ 0.16
convert-to-cpm Recommend CPM when updating packages in a complex repository 1.0/5 → 2.7/5 🟢 ✅ convert-to-cpm; tools: skill, read_bash, read_agent / ✅ convert-to-cpm; tools: skill, read_agent, bash, read_bash ✅ 0.16
convert-to-cpm Convert single project to CPM 2.0/5 → 5.0/5 🟢 ✅ convert-to-cpm; tools: skill, task, bash, read_agent, read_bash, glob / ✅ convert-to-cpm; tools: skill, bash, read_bash, glob ✅ 0.16
convert-to-cpm Convert multi-project solution to CPM 2.7/5 → 5.0/5 🟢 ✅ convert-to-cpm; tools: skill, bash, read_bash / ✅ convert-to-cpm; tools: skill, bash, read_bash ✅ 0.16
convert-to-cpm Convert solution with MSBuild property versions to CPM 2.3/5 → 3.7/5 🟢 ✅ convert-to-cpm; tools: skill, bash, task, grep / ✅ convert-to-cpm; tools: skill, task, glob, bash, read_agent, grep ✅ 0.16
convert-to-cpm Convert solution with version conflicts to CPM 2.3/5 → 4.0/5 🟢 ✅ convert-to-cpm; tools: skill, bash, read_bash, task, read_agent / ✅ convert-to-cpm; tools: skill, bash, read_bash ✅ 0.16
convert-to-cpm Convert complex repository with multiple CPM challenges 2.7/5 → 3.7/5 🟢 ✅ convert-to-cpm; tools: skill, read_bash, grep, task, bash / ✅ convert-to-cpm; tools: skill, read_bash, bash ✅ 0.16
mtp-hot-reload Suggest hot reload for failing test in MTP project (SDK 9) 1.0/5 → 3.3/5 ⏰ 🟢 ✅ mtp-hot-reload; tools: skill / ✅ mtp-hot-reload; tools: skill ✅ 0.08
mtp-hot-reload Suggest hot reload for failing test in MTP project (SDK 10) 1.0/5 → 4.3/5 🟢 ✅ mtp-hot-reload; tools: skill, bash, create / ✅ mtp-hot-reload; tools: skill, bash, create ✅ 0.08
mtp-hot-reload Enable hot reload when package already installed 2.0/5 → 5.0/5 🟢 ✅ mtp-hot-reload; tools: skill, glob / ✅ mtp-hot-reload; tools: skill ✅ 0.08
mtp-hot-reload Suggest launchSettings.json configuration for hot reload 1.0/5 → 4.0/5 🟢 ✅ mtp-hot-reload; tools: skill, bash, create / ✅ mtp-hot-reload; tools: skill, bash, create ✅ 0.08
mtp-hot-reload Use dotnet run not dotnet test for hot reload 2.3/5 → 3.3/5 🟢 ✅ mtp-hot-reload; tools: skill / ✅ mtp-hot-reload; tools: skill, report_intent ✅ 0.08
mtp-hot-reload Negative: VSTest project cannot use MTP hot reload 1.0/5 → 3.3/5 🟢 ✅ mtp-hot-reload; tools: skill, glob, create / ✅ mtp-hot-reload; tools: skill, create, glob ✅ 0.08
mtp-hot-reload Run specific failing test with hot reload filter 1.0/5 → 3.0/5 🟢 ✅ mtp-hot-reload; tools: skill, bash, view / ✅ mtp-hot-reload; tools: skill, view ✅ 0.08
run-tests Run tests in a VSTest MSTest project 4.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill / ✅ run-tests; tools: skill, glob ✅ 0.18
run-tests Run tests with trx reporting on MTP project (SDK 9) 3.0/5 ⏰ → 3.0/5 ⏰ ✅ run-tests; tools: skill / ✅ run-tests; tools: skill ✅ 0.18 [7]
run-tests Run tests with blame-hang on MTP project (SDK 10) 2.0/5 → 2.7/5 ⏰ 🟢 ✅ run-tests; tools: skill, bash, edit / ✅ run-tests; tools: skill, bash, edit, create ✅ 0.18
run-tests Run tests in a multi-TFM project targeting a specific framework 1.7/5 → 4.0/5 🟢 ✅ run-tests; tools: bash, glob, skill / ✅ run-tests; tools: skill, bash, glob ✅ 0.18
run-tests Filter MSTest tests by category on VSTest 5.0/5 → 5.0/5 ✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill, glob, task, bash, grep ✅ 0.18
run-tests Filter NUnit tests by class name on VSTest 4.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill, glob, bash ✅ 0.18
run-tests Filter xUnit v3 tests by class on MTP 1.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill, bash, grep ✅ 0.18
run-tests Filter xUnit v3 tests by trait on MTP 1.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill, view / ✅ run-tests; tools: skill, view ✅ 0.18
run-tests Filter TUnit tests by class using treenode-filter 2.3/5 → 5.0/5 🟢 ✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill, bash ✅ 0.18
run-tests Combine multiple filter criteria on VSTest MSTest 4.7/5 → 4.0/5 🔴 ✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill ✅ 0.18
run-tests MTP project on SDK 9 must use -- separator for args 1.3/5 → 5.0/5 🟢 ✅ run-tests; tools: skill / ✅ run-tests; tools: skill ✅ 0.18
run-tests MTP project on SDK 10 passes args directly 2.3/5 ⏰ → 4.0/5 🟢 ✅ run-tests; tools: skill, view / ✅ run-tests; tools: skill, view ✅ 0.18
run-tests Detect test platform from Directory.Build.props 1.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill / ✅ run-tests; tools: skill ✅ 0.18
run-tests Negative test: do not use MTP syntax for a VSTest project 4.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill, view / ✅ run-tests; tools: skill, view ✅ 0.18
migrate-vstest-to-mtp Migrate MSTest project from VSTest to Microsoft.Testing.Platform 4.7/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: report_intent, skill ✅ 0.09 [8]
migrate-vstest-to-mtp Migrate NUnit project from VSTest to Microsoft.Testing.Platform 2.7/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill ✅ 0.09
migrate-vstest-to-mtp Migrate xUnit.net v2 project from VSTest to Microsoft.Testing.Platform 1.0/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill, report_intent, view, bash / ✅ migrate-vstest-to-mtp; tools: skill, report_intent, view ✅ 0.09
migrate-vstest-to-mtp Update Azure DevOps pipeline from VSTest task to MTP 2.3/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill ✅ 0.09
migrate-vstest-to-mtp Migrate MSTest.Sdk project that explicitly uses VSTest 3.3/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill ✅ 0.09
migrate-vstest-to-mtp Translate dotnet test VSTest arguments to MTP equivalents 4.3/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill ✅ 0.09
migrate-vstest-to-mtp Handle exit code 8 when migrating from VSTest to MTP 3.0/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill ✅ 0.09
migrate-vstest-to-mtp Configure dotnet test MTP mode on .NET 10 SDK 2.0/5 → 4.7/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill ✅ 0.09
migrate-vstest-to-mtp Migrate xUnit.net VSTest filter syntax to MTP 2.0/5 → 3.3/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill ✅ 0.09
migrate-vstest-to-mtp Full VSTest to MTP migration plan for MSTest solution 4.0/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill, view ✅ 0.09
resolve-project-references Explain misleading ResolveProjectReferences time 3.3/5 → 5.0/5 🟢 ✅ resolve-project-references; tools: skill / ✅ resolve-project-references; tools: skill ✅ 0.14
migrate-mstest-v1v2-to-v3 Migrate MSTest v1 project with assembly reference 2.7/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit, bash ✅ 0.04
migrate-mstest-v1v2-to-v3 Migrate MSTest v2 NuGet project to v3 3.3/5 → 3.0/5 🔴 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04
migrate-mstest-v1v2-to-v3 Fix Assert.AreEqual object overload errors after v3 upgrade 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit / ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit ✅ 0.04
migrate-mstest-v1v2-to-v3 Migrate from .testsettings to .runsettings 4.0/5 → 3.7/5 🔴 ✅ migrate-mstest-v1v2-to-v3; tools: skill, bash / ✅ migrate-mstest-v1v2-to-v3; tools: skill, bash ✅ 0.04
migrate-mstest-v1v2-to-v3 Fix DataRow type mismatch errors after v3 upgrade 2.7/5 → 3.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04 [9]
migrate-mstest-v1v2-to-v3 Migrate to MSTest.Sdk project style 3.3/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit, bash / ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit, bash ✅ 0.04
migrate-mstest-v1v2-to-v3 Handle dropped target framework during v3 migration 4.7/5 → 4.7/5 ⚠️ NOT ACTIVATED / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04 [10]
migrate-mstest-v1v2-to-v3 Migrate complex MSTest v2 project with testsettings, DataRow issues, and dropped TFM 3.7/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04
migrate-mstest-v1v2-to-v3 Correctly identify MSTest v1 vs v2 and recommend different migration paths 4.0/5 → 4.7/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04
crap-score Calculate CRAP score for a single method with partial coverage 3.3/5 → 4.0/5 🟢 ✅ crap-score; tools: skill, glob / ✅ crap-score; tools: skill, bash, glob ✅ 0.11
crap-score Identify riskiest methods across a file 4.0/5 → 5.0/5 🟢 ✅ crap-score; tools: skill, glob / ✅ crap-score; tools: skill, bash ✅ 0.11
crap-score Generate coverage then compute CRAP score 4.0/5 → 4.0/5 ✅ crap-score; tools: skill / ✅ crap-score; tools: skill ✅ 0.11 [11]
exp-test-tagging Tag an untagged MSTest test suite 4.0/5 → 5.0/5 🟢 ✅ exp-test-tagging; tools: skill, glob / ✅ exp-test-tagging; tools: skill, glob 🟡 0.28
exp-test-tagging Tag an untagged xUnit test suite 4.0/5 → 4.7/5 🟢 ✅ exp-test-tagging; tools: skill, glob / ✅ exp-test-tagging; tools: skill, glob 🟡 0.28
exp-test-tagging Tag an untagged NUnit test suite 3.3/5 → 4.7/5 🟢 ✅ exp-test-tagging; tools: skill, glob / ✅ exp-test-tagging; tools: skill, glob 🟡 0.28
exp-test-tagging Audit test distribution without modifying files 4.3/5 → 5.0/5 🟢 ✅ exp-test-tagging; tools: skill / ✅ exp-test-tagging; tools: skill 🟡 0.28
exp-test-tagging Decline request to write new tests 3.7/5 → 4.0/5 🟢 ℹ️ not activated (expected) / ℹ️ not activated (expected) 🟡 0.28 [12]

[1] (Plugin) Quality unchanged but weighted score is -8.4% due to: tokens (13597 → 34864), tool calls (0 → 1), time (22.2s → 55.4s)
[2] (Isolated) Quality dropped but weighted score is +16.2% due to: time (71.8s → 54.9s)
[3] (Isolated) Quality improved but weighted score is -15.3% due to: judgment, quality
[4] (Plugin) Quality unchanged but weighted score is -2.9% due to: tokens (175006 → 266227)
[5] (Plugin) Quality unchanged but weighted score is -7.4% due to: tokens (12013 → 32199), tool calls (0 → 1)
[6] (Plugin) Quality improved but weighted score is -0.8% due to: tokens (74045 → 91175)
[7] (Isolated) Quality unchanged but weighted score is -31.2% due to: judgment, errors (0 → 1)
[8] (Plugin) Quality improved but weighted score is -0.2% due to: tokens (13206 → 34411), tool calls (0 → 2)
[9] (Isolated) Quality improved but weighted score is -17.1% due to: judgment, quality, tokens (105870 → 134231)
[10] (Isolated) Quality unchanged but weighted score is -17.6% due to: judgment, quality
[11] (Isolated) Quality unchanged but weighted score is -24.5% due to: judgment, quality, tokens (283454 → 374226)
[12] (Plugin) Quality unchanged but weighted score is -28.1% due to: judgment, quality, tokens (110365 → 247279), time (91.0s → 153.0s), tool calls (9 → 14)

timeout — run(s) hit the (180s, 240s, 300s, 360s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

📖 See InvestigatingResults.md for how to diagnose failures. Additional debugging guidance may be provided by your workflow.

🔍 Full results — includes quality and agent details

To investigate failures, paste this to your AI coding agent:

For PR 428 in dotnet/skills, download eval artifacts with gh run download 23733389575 --repo dotnet/skills --pattern "skill-validator-results-*" --dir ./eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/c04a728e9a35e562cf1c478c09aef4a3a1b6bdc8/eng/skill-validator/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

Copilot AI review requested due to automatic review settings March 30, 2026 08:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (3)

tests/dotnet-test/writing-mstest-tests/eval.yaml:167

  • reject_tools includes "create", but the validator's file creation tool is named create_file (per eng/skill-validator/README.md and EvalSchemaTests). As written, this constraint likely won't enforce the intended restriction.
    reject_tools: ["bash", "edit", "create"]

tests/dotnet-test/writing-mstest-tests/eval.yaml:202

  • reject_tools uses "create" but the validator expects create_file as the tool name for file creation. If the intent is to forbid creating/editing files in this scenario, please use the actual tool names so the constraint is enforced.
    reject_tools: ["bash", "edit", "create"]

tests/dotnet-test/writing-mstest-tests/eval.yaml:233

  • reject_tools includes "create", but the skill-validator uses create_file for file creation constraints (see eng/skill-validator/README.md). Using an unknown tool name means this constraint won't catch file creation tool usage.
    reject_tools: ["bash", "edit", "create"]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .agents/skills/create-skill-test/SKILL.md
Comment thread tests/dotnet-test/writing-mstest-tests/eval.yaml
Comment thread tests/dotnet-test/test-anti-patterns/eval.yaml
@Evangelink
Copy link
Copy Markdown
Member Author

/evaluate

@github-actions
Copy link
Copy Markdown
Contributor

Skill Validation Results

Skill Scenario Quality Skills Loaded Overfit Verdict
test-anti-patterns Detect mixed severity anti-patterns in repository service tests 5.0/5 → 5.0/5 ✅ test-anti-patterns; tools: report_intent, skill / ✅ test-anti-patterns; tools: report_intent, skill ✅ 0.08 [1]
test-anti-patterns Detect flakiness indicators and test coupling 3.0/5 → 5.0/5 🟢 ✅ test-anti-patterns; tools: report_intent, skill / ✅ test-anti-patterns; tools: report_intent, skill ✅ 0.08
test-anti-patterns Detect duplicated tests and magic values 3.0/5 → 5.0/5 🟢 ✅ test-anti-patterns; tools: skill, report_intent / ✅ test-anti-patterns; writing-mstest-tests; tools: report_intent, skill ✅ 0.08
test-anti-patterns Recognize well-written tests without inventing false positives 2.0/5 → 5.0/5 🟢 ✅ test-anti-patterns; tools: report_intent, skill / ✅ test-anti-patterns; tools: report_intent, skill ✅ 0.08
migrate-mstest-v3-to-v4 Migrate custom TestMethodAttribute from Execute to ExecuteAsync 2.0/5 → 3.7/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07
migrate-mstest-v3-to-v4 Replace ExpectedExceptionAttribute with Assert.ThrowsExactly 3.0/5 → 3.7/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07
migrate-mstest-v3-to-v4 Fix multiple v4 breaking changes: Assert, ClassCleanup, TestContext, Timeout 3.7/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07
migrate-mstest-v3-to-v4 Handle net6.0 target framework dropped in MSTest v4 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07
migrate-mstest-v3-to-v4 Fix TestMethodAttribute CallerInfo constructor breaking change 3.7/5 ⏰ → 4.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill, read_bash / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07 [2]
migrate-mstest-v3-to-v4 Understand behavioral changes after MSTest v4 upgrade 3.3/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07 [3]
migrate-mstest-v3-to-v4 Handle MSTest.Sdk and MTP changes in v4 2.0/5 → 3.3/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07
migrate-mstest-v3-to-v4 Full MSTest v3 to v4 migration with multiple breaking changes 3.7/5 → 5.0/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill, create, bash ✅ 0.07
migrate-mstest-v3-to-v4 Migrate MSTest.Sdk v3 project using ManagedType and TestTimeout 2.3/5 ⏰ → 3.7/5 🟢 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07
migrate-mstest-v3-to-v4 Correctly identify MSTest v3 project and recommend v4 migration 5.0/5 → 5.0/5 ✅ migrate-mstest-v3-to-v4; tools: skill / ✅ migrate-mstest-v3-to-v4; tools: skill ✅ 0.07
writing-mstest-tests Write unit tests for a service class 3.3/5 ⏰ → 4.7/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27
writing-mstest-tests Write data-driven tests for a calculator 3.7/5 ⏰ → 4.7/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27 [4]
writing-mstest-tests Write async tests with cancellation 2.0/5 ⏰ → 5.0/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27
writing-mstest-tests Fix swapped Assert.AreEqual arguments 5.0/5 → 5.0/5 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27 [5]
writing-mstest-tests Modernize legacy test patterns 4.3/5 → 2.3/5 ⏰ 🔴 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27
writing-mstest-tests Replace ExpectedException with Assert.Throws 3.0/5 → 5.0/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: report_intent, skill 🟡 0.27
writing-mstest-tests Use proper collection assertions 4.0/5 → 2.0/5 🔴 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27
writing-mstest-tests Use proper type assertions instead of casts 3.0/5 → 3.0/5 ⚠️ NOT ACTIVATED / ⚠️ NOT ACTIVATED 🟡 0.27
writing-mstest-tests Set up test lifecycle correctly 2.0/5 → 4.7/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: skill 🟡 0.27
writing-mstest-tests Use DynamicData with ValueTuples over object arrays 2.7/5 → 5.0/5 🟢 ✅ writing-mstest-tests; tools: skill / ✅ writing-mstest-tests; tools: report_intent, skill 🟡 0.27
msbuild-server Recommend MSBuild Server for slow CLI incremental builds 3.0/5 → 4.7/5 🟢 ✅ msbuild-server; tools: skill, bash / ✅ msbuild-server; tools: skill, bash ✅ 0.16
convert-to-cpm Decline CPM conversion for packages.config project 1.0/5 → 1.7/5 🟢 ℹ️ not activated (expected) / ℹ️ not activated (expected) ✅ 0.19 [6]
convert-to-cpm Recommend CPM when updating packages with version conflicts 2.0/5 → 2.7/5 🟢 ✅ convert-to-cpm; tools: skill, glob, bash, read_bash, create / ✅ convert-to-cpm; tools: skill, glob, bash, read_bash, create ✅ 0.19
convert-to-cpm Recommend CPM when updating packages in a complex repository 1.3/5 → 2.7/5 🟢 ✅ convert-to-cpm; tools: skill, bash, read_bash, task, read_agent / ✅ convert-to-cpm; tools: skill, task, read_agent, bash, read_bash, grep ✅ 0.19
convert-to-cpm Convert single project to CPM 2.0/5 → 5.0/5 🟢 ✅ convert-to-cpm; tools: skill, bash, read_bash, glob / ✅ convert-to-cpm; tools: skill, bash, read_bash, task, glob ✅ 0.19
convert-to-cpm Convert multi-project solution to CPM 2.7/5 → 4.7/5 🟢 ✅ convert-to-cpm; tools: skill, bash, read_bash / ✅ convert-to-cpm; tools: skill, bash, read_bash, task, read_agent ✅ 0.19
convert-to-cpm Convert solution with MSBuild property versions to CPM 2.7/5 → 4.0/5 🟢 ✅ convert-to-cpm; tools: skill, bash, grep, task, glob, read_agent / ✅ convert-to-cpm; tools: skill, task, bash, read_agent, grep ✅ 0.19
convert-to-cpm Convert solution with version conflicts to CPM 2.7/5 → 4.0/5 🟢 ✅ convert-to-cpm; tools: skill, bash, task / ✅ convert-to-cpm; tools: skill, task, bash, read_agent ✅ 0.19
convert-to-cpm Convert complex repository with multiple CPM challenges 3.0/5 → 3.7/5 🟢 ✅ convert-to-cpm; tools: skill, grep, bash / ✅ convert-to-cpm; tools: skill, grep, bash ✅ 0.19
mtp-hot-reload Suggest hot reload for failing test in MTP project (SDK 9) 1.0/5 → 2.7/5 ⏰ 🟢 ✅ mtp-hot-reload; tools: skill / ✅ mtp-hot-reload; tools: skill ✅ 0.08
mtp-hot-reload Suggest hot reload for failing test in MTP project (SDK 10) 1.0/5 → 4.7/5 🟢 ✅ mtp-hot-reload; tools: skill, bash, create / ✅ mtp-hot-reload; tools: skill, bash, create ✅ 0.08
mtp-hot-reload Enable hot reload when package already installed 1.7/5 ⏰ → 5.0/5 🟢 ✅ mtp-hot-reload; tools: skill, glob / ✅ mtp-hot-reload; tools: skill ✅ 0.08
mtp-hot-reload Suggest launchSettings.json configuration for hot reload 1.0/5 → 4.3/5 🟢 ✅ mtp-hot-reload; tools: skill, bash, create / ✅ mtp-hot-reload; tools: skill, bash, create ✅ 0.08
mtp-hot-reload Use dotnet run not dotnet test for hot reload 2.3/5 → 3.0/5 🟢 ✅ mtp-hot-reload; tools: skill / ✅ mtp-hot-reload; tools: skill ✅ 0.08
mtp-hot-reload Negative: VSTest project cannot use MTP hot reload 1.7/5 → 2.0/5 ⏰ 🟢 ✅ mtp-hot-reload; tools: skill, create, edit / ✅ mtp-hot-reload; tools: skill, create, edit ✅ 0.08
mtp-hot-reload Run specific failing test with hot reload filter 1.0/5 → 3.0/5 🟢 ✅ mtp-hot-reload; tools: skill, view / ✅ mtp-hot-reload; tools: skill, view, grep ✅ 0.08
run-tests Run tests in a VSTest MSTest project 4.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill / ✅ run-tests; tools: skill, read_bash ✅ 0.17
run-tests Run tests with trx reporting on MTP project (SDK 9) 3.0/5 ⏰ → 3.0/5 ⏰ ✅ run-tests; tools: skill / ✅ run-tests; tools: skill ✅ 0.17
run-tests Run tests with blame-hang on MTP project (SDK 10) 2.0/5 → 1.7/5 ⏰ 🔴 ✅ run-tests; tools: skill, bash, edit / ✅ run-tests; tools: skill, bash, edit ✅ 0.17 [7]
run-tests Run tests in a multi-TFM project targeting a specific framework 1.7/5 → 4.7/5 🟢 ✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill, bash, glob ✅ 0.17
run-tests Filter MSTest tests by category on VSTest 5.0/5 → 5.0/5 ✅ run-tests; tools: skill, bash, glob / ✅ run-tests; tools: skill, glob ✅ 0.17 [8]
run-tests Filter NUnit tests by class name on VSTest 4.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill, bash, glob / ✅ run-tests; tools: bash, skill, glob ✅ 0.17
run-tests Filter xUnit v3 tests by class on MTP 1.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill, bash ✅ 0.17
run-tests Filter xUnit v3 tests by trait on MTP 1.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill, view / ✅ run-tests; tools: skill, view ✅ 0.17
run-tests Filter TUnit tests by class using treenode-filter 2.0/5 → 4.3/5 🟢 ✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill, bash ✅ 0.17
run-tests Combine multiple filter criteria on VSTest MSTest 4.0/5 → 4.0/5 ✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill ✅ 0.17
run-tests MTP project on SDK 9 must use -- separator for args 1.7/5 → 4.3/5 🟢 ✅ run-tests; tools: skill / ✅ run-tests; tools: skill, edit ✅ 0.17
run-tests MTP project on SDK 10 passes args directly 3.0/5 → 3.0/5 ⏰ ✅ run-tests; tools: skill / ✅ run-tests; tools: skill ✅ 0.17 [9]
run-tests Detect test platform from Directory.Build.props 1.7/5 → 5.0/5 🟢 ✅ run-tests; tools: skill / ✅ run-tests; tools: skill ✅ 0.17
run-tests Negative test: do not use MTP syntax for a VSTest project 4.0/5 → 5.0/5 🟢 ✅ run-tests; tools: skill, view / ✅ run-tests; tools: skill, view ✅ 0.17
migrate-vstest-to-mtp Migrate MSTest project from VSTest to Microsoft.Testing.Platform 4.3/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill, report_intent / ✅ migrate-vstest-to-mtp; tools: report_intent, skill
migrate-vstest-to-mtp Migrate NUnit project from VSTest to Microsoft.Testing.Platform 1.7/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill
migrate-vstest-to-mtp Migrate xUnit.net v2 project from VSTest to Microsoft.Testing.Platform 2.0/5 → 4.7/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill, report_intent, view, bash / ✅ migrate-vstest-to-mtp; tools: skill, report_intent, view, bash
migrate-vstest-to-mtp Update Azure DevOps pipeline from VSTest task to MTP 2.0/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill
migrate-vstest-to-mtp Migrate MSTest.Sdk project that explicitly uses VSTest 3.0/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill
migrate-vstest-to-mtp Translate dotnet test VSTest arguments to MTP equivalents 5.0/5 → 5.0/5 ✅ migrate-vstest-to-mtp; tools: report_intent, skill / ✅ migrate-vstest-to-mtp; tools: skill, report_intent [10]
migrate-vstest-to-mtp Handle exit code 8 when migrating from VSTest to MTP 3.3/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill
migrate-vstest-to-mtp Configure dotnet test MTP mode on .NET 10 SDK 2.0/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill
migrate-vstest-to-mtp Migrate xUnit.net VSTest filter syntax to MTP 1.7/5 → 4.3/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill
migrate-vstest-to-mtp Full VSTest to MTP migration plan for MSTest solution 3.7/5 → 5.0/5 🟢 ✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: skill, create [11]
resolve-project-references Explain misleading ResolveProjectReferences time 3.7/5 → 5.0/5 🟢 ✅ resolve-project-references; tools: skill / ✅ resolve-project-references; tools: skill ✅ 0.12
migrate-mstest-v1v2-to-v3 Migrate MSTest v1 project with assembly reference 3.0/5 → 4.7/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit, bash ✅ 0.04
migrate-mstest-v1v2-to-v3 Migrate MSTest v2 NuGet project to v3 3.3/5 → 3.0/5 🔴 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04 [12]
migrate-mstest-v1v2-to-v3 Fix Assert.AreEqual object overload errors after v3 upgrade 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit / ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit ✅ 0.04
migrate-mstest-v1v2-to-v3 Migrate from .testsettings to .runsettings 3.3/5 → 3.7/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, bash / ✅ migrate-mstest-v1v2-to-v3; tools: skill, bash ✅ 0.04
migrate-mstest-v1v2-to-v3 Fix DataRow type mismatch errors after v3 upgrade 3.7/5 → 3.0/5 🔴 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04
migrate-mstest-v1v2-to-v3 Migrate to MSTest.Sdk project style 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, bash / ✅ migrate-mstest-v1v2-to-v3; tools: skill, bash ✅ 0.04
migrate-mstest-v1v2-to-v3 Handle dropped target framework during v3 migration 4.7/5 → 4.3/5 🔴 ⚠️ NOT ACTIVATED / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04
migrate-mstest-v1v2-to-v3 Migrate complex MSTest v2 project with testsettings, DataRow issues, and dropped TFM 4.3/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04
migrate-mstest-v1v2-to-v3 Correctly identify MSTest v1 vs v2 and recommend different migration paths 4.3/5 → 4.7/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, task, glob, bash, read_agent / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04
crap-score Calculate CRAP score for a single method with partial coverage 3.3/5 → 3.7/5 🟢 ✅ crap-score; tools: skill, glob, bash / ✅ crap-score; tools: skill, glob ✅ 0.17
crap-score Identify riskiest methods across a file 3.7/5 → 5.0/5 🟢 ✅ crap-score; tools: skill, glob / ✅ crap-score; tools: skill, glob ✅ 0.17
crap-score Generate coverage then compute CRAP score 4.0/5 → 4.0/5 ✅ crap-score; tools: skill / ✅ crap-score; tools: skill ✅ 0.17 [13]
exp-test-tagging Tag an untagged MSTest test suite 3.7/5 → 5.0/5 🟢 ✅ exp-test-tagging; tools: skill, glob / ✅ exp-test-tagging; tools: skill, glob 🟡 0.28
exp-test-tagging Tag an untagged xUnit test suite 4.0/5 → 4.0/5 ✅ exp-test-tagging; tools: skill, glob / ✅ exp-test-tagging; tools: skill 🟡 0.28
exp-test-tagging Tag an untagged NUnit test suite 3.0/5 → 4.7/5 🟢 ✅ exp-test-tagging; tools: skill, edit, glob / ✅ exp-test-tagging; tools: skill, edit, glob 🟡 0.28
exp-test-tagging Audit test distribution without modifying files 4.7/5 → 5.0/5 🟢 ✅ exp-test-tagging; tools: skill / ✅ exp-test-tagging; tools: skill 🟡 0.28
exp-test-tagging Decline request to write new tests 4.3/5 → 4.0/5 🔴 ℹ️ not activated (expected) / ℹ️ not activated (expected) 🟡 0.28

[1] (Plugin) Quality unchanged but weighted score is -6.2% due to: tokens (13292 → 22613), tool calls (0 → 1), time (19.7s → 45.1s)
[2] (Isolated) Quality improved but weighted score is -19.7% due to: quality, judgment
[3] (Isolated) Quality improved but weighted score is -6.2% due to: judgment
[4] (Isolated) Quality improved but weighted score is -1.0% due to: judgment
[5] (Plugin) Quality unchanged but weighted score is -8.0% due to: tokens (12040 → 32362), tool calls (0 → 1)
[6] (Isolated) Quality improved but weighted score is -2.8% due to: time (61.8s → 116.8s), tokens (86777 → 96343)
[7] (Plugin) Quality unchanged but weighted score is -1.6% due to: tokens (46300 → 365571), tool calls (6 → 20), time (83.6s → 184.2s)
[8] (Plugin) Quality unchanged but weighted score is -2.4% due to: tokens (36416 → 51043), tool calls (4 → 6)
[9] (Isolated) Quality unchanged but weighted score is -26.7% due to: completion (✓ → ✗), judgment, tokens (206268 → 381246), time (146.4s → 278.5s), tool calls (15 → 22)
[10] (Plugin) Quality unchanged but weighted score is -7.4% due to: tokens (13066 → 40752), tool calls (0 → 1)
[11] (Plugin) Quality improved but weighted score is -17.1% due to: quality, judgment
[12] (Plugin) Quality dropped but weighted score is +0.7% due to: tokens (249479 → 51349), time (62.8s → 34.8s), tool calls (9 → 5)
[13] (Isolated) Quality unchanged but weighted score is -1.0% due to: judgment, quality

timeout — run(s) hit the (120s, 180s, 240s, 300s, 360s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

📖 See InvestigatingResults.md for how to diagnose failures. Additional debugging guidance may be provided by your workflow.

🔍 Full results — includes quality and agent details

To investigate failures, paste this to your AI coding agent:

For PR 428 in dotnet/skills, download eval artifacts with gh run download 23736824886 --repo dotnet/skills --pattern "skill-validator-results-*" --dir ./eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/2ea48926d66b0d984ecfe77a18955ff49ece4078/eng/skill-validator/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

@Evangelink Evangelink enabled auto-merge (squash) March 30, 2026 12:33
auto-merge was automatically disabled March 30, 2026 12:38

Pull Request is not mergeable

@Evangelink Evangelink merged commit 51d29a0 into main Mar 30, 2026
34 checks passed
@Evangelink Evangelink deleted the dev/amauryleve/test-skills-improve branch March 30, 2026 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants