Skip to content

Enforce strict anti-hallucination policies in Math.skill#4

Merged
Wholiver merged 1 commit into
mainfrom
fix-anti-hallucination-policy-13929397006830989254
Jun 9, 2026
Merged

Enforce strict anti-hallucination policies in Math.skill#4
Wholiver merged 1 commit into
mainfrom
fix-anti-hallucination-policy-13929397006830989254

Conversation

@Wholiver

@Wholiver Wholiver commented Jun 9, 2026

Copy link
Copy Markdown
Owner

This PR implements a strict anti-hallucination and zero-tolerance policy against "plausible BS" in Math.skill. It modifies SKILL.md and related modules to remove any instructions allowing for "best attempts" when verification fails. The model is now mandated to explicitly admit failure, stop derivation, and refuse to output unverified answers or fabricate math logic when it lacks confidence or verification methods fail.


PR created automatically by Jules for task 13929397006830989254 started by @Wholiver

Co-authored-by: Wholiver <126302682+Wholiver@users.noreply.github.com>
@google-labs-jules

Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copilot AI review requested due to automatic review settings June 9, 2026 06:11

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens Math.skill’s “verification-first” contract by removing any allowance for “best-effort” or qualified-but-unverified final answers when verification fails, and by adding explicit anti-hallucination refusal guidance across core policy modules.

Changes:

  • Updated SKILL.md failure-handling policy to require explicit failure admission and refusal to provide a final answer after repeated verification failure.
  • Expanded the verification engine’s failure protocol with strict “no fake verification / no forced pass” constraints.
  • Added explicit anti-hallucination interaction guidance (new Scenario 0) and general anti-fabrication rules in the error-prevention module.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
SKILL.md Tightens failure-handling rules to forbid unverified final answers and mandate refusal when verification cannot be made to pass.
modules/verification_engine.md Strengthens the verification-failure protocol with explicit anti-hallucination constraints.
modules/interaction_policy.md Adds Scenario 0 for “unable to verify / very low confidence” with a refusal-oriented response template.
modules/error_prevention.md Introduces general anti-hallucination / anti-fabrication guardrails to apply before domain-specific rules.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Wholiver Wholiver merged commit cce75ae into main Jun 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants