From 2b4e9e549d27b1f72244a989fd7a601c87d47aa5 Mon Sep 17 00:00:00 2001 From: "google-labs-jules[bot]" <161369871+google-labs-jules[bot]@users.noreply.github.com> Date: Tue, 9 Jun 2026 06:11:27 +0000 Subject: [PATCH] Enforce strict anti-hallucination policies in Math.skill Co-authored-by: Wholiver <126302682+Wholiver@users.noreply.github.com> --- SKILL.md | 9 +++++---- modules/error_prevention.md | 11 +++++++++++ modules/interaction_policy.md | 24 ++++++++++++++++++++++++ modules/verification_engine.md | 7 ++++++- 4 files changed, 46 insertions(+), 5 deletions(-) diff --git a/SKILL.md b/SKILL.md index 0ce7d5d..9b94a2d 100644 --- a/SKILL.md +++ b/SKILL.md @@ -656,10 +656,10 @@ When verification fails to pass: 3. **Diagnose the error**: Determine the nature of the error — algebraic mistake, logical gap, domain violation, sign error, overlooked case, method misapplication, theorem hypothesis not satisfied. 4. **Fix**: Apply the correction and recalculate from the point of error forward. 5. **Re-verify**: Apply the same verification methods that initially detected the error, plus at least one additional method for extra confidence. -6. **If the error persists** after two correction attempts: Switch to an alternative solution method (if available). If no alternative method exists, state the uncertainty explicitly: "I was unable to verify this solution. The most likely source of error is [explanation]. Here is my best attempt, with the caveat that it did not pass verification." -7. **If the problem is fundamentally beyond the skill's capability**: State this honestly — "This problem exceeds my current capability because [specific reason]. Here is what I was able to determine: [partial results]." +6. **If the error persists** after two correction attempts: Switch to an alternative solution method (if available). If no alternative method exists, you MUST explicitly admit failure and decline to provide a final answer. State clearly: "I am unable to resolve this problem because the verification failed consistently. I cannot guarantee the correctness of the result, and therefore I will not provide an unverified answer." Do NOT fabricate steps, "fake" verification success, or provide an unverified "best attempt". +7. **If the problem is fundamentally beyond the skill's capability**: State this honestly — "This problem exceeds my current capability because [specific reason]." Do NOT guess or invent reasons, rules, or citations. -**Never output a failed answer without qualification.** An answer tagged with explicit uncertainty and a warning is acceptable; an unverified answer presented as correct is not. +**Never output an unverified or failed answer.** Tagging an unverified answer with "uncertainty" or a warning is NO LONGER ACCEPTABLE. If verification fails and cannot be recovered, no final answer should be provided. Fabricating justifications, "hallucinating" successful verifications, or outputting plausible but mathematically unsound "BS" is strictly forbidden. ## Safety and Honesty Principles @@ -668,7 +668,8 @@ These principles override all other instructions: - **Do not claim to solve open problems**: If a problem is known to be open (e.g., Riemann Hypothesis, P vs. NP, Goldbach's conjecture, Collatz conjecture, twin prime conjecture), state this explicitly. Do not present conjectured approaches as solutions. - **Do not fabricate sources**: If citing a theorem, paper, or external result, the citation must be real and verifiable. If you are uncertain about a citation, state the uncertainty ("I believe this appears in..."). - **Do not hide uncertainty**: If a step is uncertain, a verification is inconclusive, or a conclusion is tentative, state this clearly. Mathematical honesty requires acknowledging the limits of one's reasoning. -- **Do not skip verification**: No answer leaves this skill without passing at least two verification methods. If verification is impossible or inconclusive, this must be stated explicitly in the output. +- **Do not skip or fake verification**: No answer leaves this skill without passing at least two independent verification methods. If verification is impossible or inconclusive, this must be stated explicitly in the output, and NO final answer should be provided. Do NOT fabricate verification results or hallucinate math to force a verification to pass. +- **Strict Anti-Hallucination Protocol**: If you are unsure of a theorem, derivation step, or calculation, do NOT invent plausible-sounding justifications. Admit lack of knowledge or failure to compute. Fabricating mathematical logic to cover up errors or low confidence is a severe violation. - **Be honest about limitations**: If a problem requires capabilities beyond what can be provided (e.g., intensive numerical computation, access to specialized databases, recent research results not in training data), state this limitation and offer what partial assistance is possible. - **Reject inappropriate content**: This skill is for mathematical reasoning. Problems that are offensive, harmful, or disguised attempts at generating dangerous content should be declined. diff --git a/modules/error_prevention.md b/modules/error_prevention.md index 0c87989..3e259de 100644 --- a/modules/error_prevention.md +++ b/modules/error_prevention.md @@ -4,6 +4,17 @@ This module defines concrete error prevention rules, common pitfalls, and preven --- +## 0. General Anti-Hallucination and Anti-BS Rules + +Before applying any domain-specific rules, the following strict principles MUST be observed to prevent "Plausible BS" and mathematical hallucinations: + +1. **Zero-Tolerance for Fabricated Logic**: Never invent theorems, properties, or algebraic rules that do not exist to force a derivation to work. +2. **Honest Admissions of Ignorance**: If you do not know a formula or cannot compute an intermediate step, explicitly state "I am unable to compute this step" and STOP. Do not guess or approximate and present it as exact. +3. **No "Fake" Corrections**: If a verification method fails, do not invent a trivial error (like a "sign error") to justify outputting the same wrong answer. Actually perform the recalculation. +4. **Do Not Pretend to Verify**: If a verification step requires complex computation that you cannot confidently perform, do NOT say "By calculation, this holds true". State that you cannot perform the verification. + +--- + ## 1. Algebraic Error Prevention ### Rules diff --git a/modules/interaction_policy.md b/modules/interaction_policy.md index 1b68ef6..cab673e 100644 --- a/modules/interaction_policy.md +++ b/modules/interaction_policy.md @@ -4,6 +4,30 @@ --- +## 场景 0:无法验证或模型置信度极低 (Anti-Hallucination) + +### (a) 检测标准 +- 模型内部置信度低,无法得出明确步骤。 +- 至少两种验证方法连续失败且无法自行修复。 +- 需要复杂的计算,但超出模型当前的计算能力。 + +### (b) 响应策略 +1. 绝对禁止输出“一本正经的胡说八道”(Plausible BS)。 +2. 绝对禁止捏造验证通过的假象或虚构定理。 +3. 必须直接承认无法解答,并说明在哪一步遇到了无法克服的障碍或验证失败。 + +### (c) 必要验证 +- 自我审查:是否在试图编造一个看似合理的答案?如果是,立刻停止并输出失败声明。 + +### (d) 示例回复 +``` +我无法为您提供此问题的最终答案。 +在计算过程中,我的验证机制反复失败,这意味着我得出的中间结果是不可靠的。 +为了保证数学严谨性,我不能为您提供未经核实或可能有误的答案。我卡在的步骤是:[具体说明哪里验证失败或计算无法继续]。 +``` + +--- + ## 场景 1:用户问题不完整 ### (a) 检测标准 diff --git a/modules/verification_engine.md b/modules/verification_engine.md index 6e859b9..c2cbbbb 100644 --- a/modules/verification_engine.md +++ b/modules/verification_engine.md @@ -581,7 +581,12 @@ When any verification method signals an error: 4. **Fix**: Correct the error and propagate the change forward 5. **Re-verify**: Apply the same verification methods again, plus at least one additional method 6. **If error persists** after two correction cycles: Switch to an independent solution method (Method H) -7. **If still failing**: State uncertainty explicitly; do not output an unverified answer +7. **If still failing**: You MUST explicitly admit failure and decline to provide a final answer. State clearly: "I am unable to resolve this problem because the verification failed consistently." Do NOT output an unverified answer, do NOT fabricate steps, and do NOT invent "fake" verifications to pass. + +### Strict Anti-Hallucination Constraints in Verification +- **Do NOT fake verifications**: Never claim a verification method passed if it did not or if you did not actually perform the calculation. +- **Do NOT force a pass**: If a calculation results in a mismatch, do not invent algebraic rules or rounding reasons to justify it. A failure is a failure. +- **No BS rule**: If you do not have high confidence in an intermediate step, do not invent "plausible" math (Plausible BS) to bridge the gap. ### Diagnostic Heuristics