-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathrelevance_labeling.txt
More file actions
103 lines (75 loc) · 3.03 KB
/
relevance_labeling.txt
File metadata and controls
103 lines (75 loc) · 3.03 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
===================================================================
BINARY RELEVANCE LABELING PROMPT
===================================================================
Model: Llama 3.3-70B-Instruct
Task: Determine if a snippet supports/contains a given fact
Output: Binary YES/NO judgment
-------------------------------------------------------------------
PROMPT TEMPLATE
-------------------------------------------------------------------
Does this snippet contain or support the following fact?
Fact: {fact}
Snippet: {snippet}
Answer YES if the snippet contains this information, NO otherwise.
Think step by step:
1. What is the fact claiming?
2. Does the snippet mention this information?
3. Is the information in the snippet consistent with the fact?
Format: YES/NO
Answer:
-------------------------------------------------------------------
PARAMETERS
-------------------------------------------------------------------
{fact}:
- Gold nugget answer or query text
- Example: "Machu Picchu was built in the 15th century"
{snippet}:
- Retrieved text chunk (500 characters)
- 100-character overlap between consecutive chunks
- Preserves sentence boundaries where possible
-------------------------------------------------------------------
MODEL CONFIGURATION
-------------------------------------------------------------------
Temperature: 0.1
- Low temperature for consistent judgments
- Reduces randomness in labeling
Max Retries: 2
- Retry on API failures
Top P: 0.95
- Nucleus sampling (default)
Max Tokens: 10
- Only need "YES" or "NO"
-------------------------------------------------------------------
OUTPUT PARSING
-------------------------------------------------------------------
Parse model response to extract binary label:
r(q,s) = 1 if "YES" appears in response (case-insensitive)
r(q,s) = 0 otherwise
Common response patterns:
- "Answer: YES"
- "YES, the snippet contains..."
- "ANSWER: YES"
-------------------------------------------------------------------
VALIDATION
-------------------------------------------------------------------
Quality control:
- 10% of labels manually reviewed
- Agreement rate: >90% with human judgments
- Reported in paper Section 3.2
Use cases:
1. Calibration dataset: Generate positive/negative examples
2. Test dataset: Evaluate empirical coverage
-------------------------------------------------------------------
USAGE EXAMPLE
-------------------------------------------------------------------
Input:
Fact: "The Murchison meteorite fell in Australia in 1969"
Snippet: "On September 28, 1969, a large meteorite fell near
Murchison, Victoria, Australia. The Murchison meteorite
is one of the most studied meteorites due to its large
mass and the presence of organic compounds."
Expected Output: "YES"
Parsed Label: r(q,s) = 1
-------------------------------------------------------------------
Source: src/calibration/entailment_checker.py (lines 43-56)
Paper: Section 3.2 "Split Conformal Prediction for Context Filtering"