Enforce strict anti-hallucination policies in Math.skill#4
Conversation
Co-authored-by: Wholiver <126302682+Wholiver@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Pull request overview
This PR tightens Math.skill’s “verification-first” contract by removing any allowance for “best-effort” or qualified-but-unverified final answers when verification fails, and by adding explicit anti-hallucination refusal guidance across core policy modules.
Changes:
- Updated SKILL.md failure-handling policy to require explicit failure admission and refusal to provide a final answer after repeated verification failure.
- Expanded the verification engine’s failure protocol with strict “no fake verification / no forced pass” constraints.
- Added explicit anti-hallucination interaction guidance (new Scenario 0) and general anti-fabrication rules in the error-prevention module.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| SKILL.md | Tightens failure-handling rules to forbid unverified final answers and mandate refusal when verification cannot be made to pass. |
| modules/verification_engine.md | Strengthens the verification-failure protocol with explicit anti-hallucination constraints. |
| modules/interaction_policy.md | Adds Scenario 0 for “unable to verify / very low confidence” with a refusal-oriented response template. |
| modules/error_prevention.md | Introduces general anti-hallucination / anti-fabrication guardrails to apply before domain-specific rules. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This PR implements a strict anti-hallucination and zero-tolerance policy against "plausible BS" in Math.skill. It modifies
SKILL.mdand related modules to remove any instructions allowing for "best attempts" when verification fails. The model is now mandated to explicitly admit failure, stop derivation, and refuse to output unverified answers or fabricate math logic when it lacks confidence or verification methods fail.PR created automatically by Jules for task 13929397006830989254 started by @Wholiver