conformal-context-engineering/prompts/relevance_labeling.txt at main · hltcoe/conformal-context-engineering · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
===================================================================
BINARY RELEVANCE LABELING PROMPT
===================================================================

Model: Llama 3.3-70B-Instruct
Task: Determine if a snippet supports/contains a given fact
Output: Binary YES/NO judgment

-------------------------------------------------------------------
PROMPT TEMPLATE
-------------------------------------------------------------------

Does this snippet contain or support the following fact?

Fact: {fact}

Snippet: {snippet}

Answer YES if the snippet contains this information, NO otherwise.
Think step by step:
1. What is the fact claiming?
2. Does the snippet mention this information?
3. Is the information in the snippet consistent with the fact?

Format: YES/NO
Answer:

-------------------------------------------------------------------
PARAMETERS
-------------------------------------------------------------------

{fact}:
    - Gold nugget answer or query text
    - Example: "Machu Picchu was built in the 15th century"

{snippet}:
    - Retrieved text chunk (500 characters)
    - 100-character overlap between consecutive chunks
    - Preserves sentence boundaries where possible

-------------------------------------------------------------------
MODEL CONFIGURATION
-------------------------------------------------------------------

Temperature: 0.1
    - Low temperature for consistent judgments
    - Reduces randomness in labeling

Max Retries: 2
    - Retry on API failures

Top P: 0.95
    - Nucleus sampling (default)

Max Tokens: 10
    - Only need "YES" or "NO"

-------------------------------------------------------------------
OUTPUT PARSING
-------------------------------------------------------------------

Parse model response to extract binary label:

r(q,s) = 1  if "YES" appears in response (case-insensitive)
r(q,s) = 0  otherwise

Common response patterns:
- "Answer: YES"
- "YES, the snippet contains..."
- "ANSWER: YES"

-------------------------------------------------------------------
VALIDATION
-------------------------------------------------------------------

Quality control:
- 10% of labels manually reviewed
- Agreement rate: >90% with human judgments
- Reported in paper Section 3.2

Use cases:
1. Calibration dataset: Generate positive/negative examples
2. Test dataset: Evaluate empirical coverage

-------------------------------------------------------------------
USAGE EXAMPLE
-------------------------------------------------------------------

Input:
    Fact: "The Murchison meteorite fell in Australia in 1969"
    Snippet: "On September 28, 1969, a large meteorite fell near
              Murchison, Victoria, Australia. The Murchison meteorite
              is one of the most studied meteorites due to its large
              mass and the presence of organic compounds."

Expected Output: "YES"

Parsed Label: r(q,s) = 1

-------------------------------------------------------------------

Source: src/calibration/entailment_checker.py (lines 43-56)
Paper: Section 3.2 "Split Conformal Prediction for Context Filtering"