Conversation
|
/evaluate |
Skill Validation Results
[1] (Isolated) Quality improved but weighted score is -4.5% due to: judgment
Model: claude-opus-4.6 | Judge: claude-opus-4.6 |
3ffe052 to
c04a728
Compare
|
/evaluate |
Skill Validation Results
[1] (Plugin) Quality unchanged but weighted score is -8.4% due to: tokens (13597 → 34864), tool calls (0 → 1), time (22.2s → 55.4s)
Model: claude-opus-4.6 | Judge: claude-opus-4.6
🔍 Full results — includes quality and agent details
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (3)
tests/dotnet-test/writing-mstest-tests/eval.yaml:167
reject_toolsincludes"create", but the validator's file creation tool is namedcreate_file(per eng/skill-validator/README.md and EvalSchemaTests). As written, this constraint likely won't enforce the intended restriction.
reject_tools: ["bash", "edit", "create"]
tests/dotnet-test/writing-mstest-tests/eval.yaml:202
reject_toolsuses"create"but the validator expectscreate_fileas the tool name for file creation. If the intent is to forbid creating/editing files in this scenario, please use the actual tool names so the constraint is enforced.
reject_tools: ["bash", "edit", "create"]
tests/dotnet-test/writing-mstest-tests/eval.yaml:233
reject_toolsincludes"create", but the skill-validator usescreate_filefor file creation constraints (see eng/skill-validator/README.md). Using an unknown tool name means this constraint won't catch file creation tool usage.
reject_tools: ["bash", "edit", "create"]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/evaluate |
Skill Validation Results
[1] (Plugin) Quality unchanged but weighted score is -6.2% due to: tokens (13292 → 22613), tool calls (0 → 1), time (19.7s → 45.1s)
Model: claude-opus-4.6 | Judge: claude-opus-4.6
🔍 Full results — includes quality and agent details
|
Pull Request is not mergeable
Use skill validation tools to understand what can be improved on test skills