Skip to content
This repository was archived by the owner on Mar 27, 2026. It is now read-only.

Commit 1ba1d8a

Browse files
committed
refactor instructions
1 parent 7e70279 commit 1ba1d8a

1 file changed

Lines changed: 72 additions & 67 deletions

File tree

src/utils/diff_parser.rs

Lines changed: 72 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,77 @@ impl DiffParser {
128128
Ok(files)
129129
}
130130

131+
/// Get the instructions for interpreting git diff output
132+
fn get_diff_instructions() -> Vec<String> {
133+
let instructions = r#"**Instructions for Interpreting Git Diff Output**
134+
135+
This document provides a guide to understanding the diff output generated by RepoDiff.
136+
137+
**Important Note:** The diff output in `repodiff_output.txt` has been sanitized to focus what's relevant for understanding the diffs.
138+
Real-world Git diff output may contain more details.
139+
140+
**1. Basic Structure:**
141+
142+
A Git diff file describes the *differences* between two versions of a file. It's structured into *hunks*, which represent contiguous regions of change.
143+
144+
* `diff --git a/<path> b/<path>`: Indicates the file being compared. `a/` refers to the "old" version, and `b/` refers to the "new" version. (Note that paths always use forward slashes in Git diff output, even on Windows systems.)
145+
* `--- a/<path>`: Marks the beginning of the original file content.
146+
* `+++ b/<path>`: Marks the beginning of the modified file content.
147+
* `@@ -<start_line_old>,<num_lines_old> +<start_line_new>,<num_lines_new> @@ <section_header>`: This is the *hunk header*. (Optional in simplified output, but common in real diffs).
148+
* `-<start_line_old>,<num_lines_old>`: Indicates the starting line number and number of lines in the *old* version of the file that this hunk represents. If only one line is affected, `,<num_lines_old>` will be omitted.
149+
* `+<start_line_new>,<num_lines_new>`: Indicates the starting line number and number of lines in the *new* version of the file that this hunk represents. If only one line is affected, `,<num_lines_new>` will be omitted.
150+
* `<section_header>`: (Optional) This is often a function or method name, providing context for the change.
151+
* Hunk Content: Lines within a hunk are marked with a prefix:
152+
* ` ` (space): Unchanged line (context).
153+
* `-`: Line removed from the old version.
154+
* `+`: Line added to the new version.
155+
156+
**2. Simplified Example:**
157+
158+
```
159+
diff --git a/MyFile.cs b/MyFile.cs
160+
--- a/MyFile.cs
161+
+++ b/MyFile.cs
162+
// Some code
163+
string oldValue = "old";
164+
-// Removed line
165+
+string newValue = "new";
166+
// More code
167+
```
168+
169+
**Explanation of the Example:**
170+
171+
* The file being changed is `MyFile.cs`.
172+
* `" string oldValue = "old";"`: This line is present in both versions.
173+
* `-// Removed line`: This line was removed from the old version.
174+
* `+string newValue = "new";`: This line was added to the new version.
175+
* `" // More code"`: This line is present in both versions.
176+
177+
**3. Key LLM Considerations:**
178+
179+
* **Focus on Content Lines:** The most important part for understanding changes is the content prefixed with ` `, `-`, or `+`.
180+
* **Context is Crucial:** Use the surrounding unchanged lines to understand the *purpose* of the change.
181+
* **File Paths:** Pay attention to the file paths (`a/<path>`, `b/<path>`) to understand which files are being modified.
182+
183+
**4. Application to your File:**
184+
185+
* **".cs" Files:** Changes to C# source code. Focus on the addition (`+`) and removal (`-`) of code lines to understand logic changes.
186+
* **"Test*.cs" Files:** Changes to unit test files. These are often important for understanding how the functionality is being tested and whether the changes are robust.
187+
* **".xml" Files:** Changes to configuration or data files. Look for added, removed, or modified XML elements and attributes. Focus is usually on changes to properties.
188+
189+
**5. Special Instructions for File Types based on the given filters:**
190+
191+
* `.cs` code is assumed to not contain test code
192+
* `*Test*.cs` contain test code, which should be helpful for understanding functionality.
193+
* `*.xml` contains configuration.
194+
195+
By focusing on these key elements, you can effectively extract meaningful information from Git diff output and summarize the changes made in a software project.
196+
197+
---
198+
"#;
199+
instructions.lines().map(|s| s.to_string()).collect()
200+
}
201+
131202
/// Reconstruct a unified diff from the processed patch dictionary
132203
///
133204
/// # Arguments
@@ -138,73 +209,7 @@ impl DiffParser {
138209

139210
// Only add instructions if the patch dictionary is not empty
140211
if !patch_dict.is_empty() {
141-
// Add instructions at the beginning of the output
142-
output.push("**Instructions for Interpreting Git Diff Output**".to_string());
143-
output.push("".to_string());
144-
output.push("This document provides a guide to understanding the diff output generated by RepoDiff.".to_string());
145-
output.push("".to_string());
146-
output.push("**Important Note:** The diff output in `repodiff_output.txt` has been sanitized to focus what's relevant for understanding the diffs.".to_string());
147-
output.push("Real-world Git diff output may contain more details.".to_string());
148-
output.push("".to_string());
149-
output.push("**1. Basic Structure:**".to_string());
150-
output.push("".to_string());
151-
output.push("A Git diff file describes the *differences* between two versions of a file. It's structured into *hunks*, which represent contiguous regions of change.".to_string());
152-
output.push("".to_string());
153-
output.push("* `diff --git a/<path> b/<path>`: Indicates the file being compared. `a/` refers to the \"old\" version, and `b/` refers to the \"new\" version. (Note that paths always use forward slashes in Git diff output, even on Windows systems.)".to_string());
154-
output.push("* `--- a/<path>`: Marks the beginning of the original file content.".to_string());
155-
output.push("* `+++ b/<path>`: Marks the beginning of the modified file content.".to_string());
156-
output.push("* `@@ -<start_line_old>,<num_lines_old> +<start_line_new>,<num_lines_new> @@ <section_header>`: This is the *hunk header*. (Optional in simplified output, but common in real diffs).".to_string());
157-
output.push(" * `-<start_line_old>,<num_lines_old>`: Indicates the starting line number and number of lines in the *old* version of the file that this hunk represents. If only one line is affected, `,<num_lines_old>` will be omitted.".to_string());
158-
output.push(" * `+<start_line_new>,<num_lines_new>`: Indicates the starting line number and number of lines in the *new* version of the file that this hunk represents. If only one line is affected, `,<num_lines_new>` will be omitted.".to_string());
159-
output.push(" * `<section_header>`: (Optional) This is often a function or method name, providing context for the change.".to_string());
160-
output.push("* Hunk Content: Lines within a hunk are marked with a prefix:".to_string());
161-
output.push(" * ` ` (space): Unchanged line (context).".to_string());
162-
output.push(" * `-`: Line removed from the old version.".to_string());
163-
output.push(" * `+`: Line added to the new version.".to_string());
164-
output.push("".to_string());
165-
output.push("**2. Simplified Example:**".to_string());
166-
output.push("".to_string());
167-
output.push("```".to_string());
168-
output.push("diff --git a/MyFile.cs b/MyFile.cs".to_string());
169-
output.push("--- a/MyFile.cs".to_string());
170-
output.push("+++ b/MyFile.cs ".to_string());
171-
output.push(" // Some code".to_string());
172-
output.push(" string oldValue = \"old\";".to_string());
173-
output.push("-// Removed line".to_string());
174-
output.push("+string newValue = \"new\";".to_string());
175-
output.push(" // More code".to_string());
176-
output.push("```".to_string());
177-
output.push("".to_string());
178-
output.push("**Explanation of the Example:**".to_string());
179-
output.push("".to_string());
180-
output.push("* The file being changed is `MyFile.cs`.".to_string());
181-
output.push("* `\" string oldValue = \"old\";\"`: This line is present in both versions.".to_string());
182-
output.push("* `-// Removed line`: This line was removed from the old version.".to_string());
183-
output.push("* `+string newValue = \"new\";`: This line was added to the new version.".to_string());
184-
output.push("* `\" // More code\"`: This line is present in both versions.".to_string());
185-
output.push("".to_string());
186-
output.push("**3. Key LLM Considerations:**".to_string());
187-
output.push("".to_string());
188-
output.push("* **Focus on Content Lines:** The most important part for understanding changes is the content prefixed with ` `, `-`, or `+`.".to_string());
189-
output.push("* **Context is Crucial:** Use the surrounding unchanged lines to understand the *purpose* of the change.".to_string());
190-
output.push("* **File Paths:** Pay attention to the file paths (`a/<path>`, `b/<path>`) to understand which files are being modified.".to_string());
191-
output.push("".to_string());
192-
output.push("**4. Application to your File:**".to_string());
193-
output.push("".to_string());
194-
output.push("* **\".cs\" Files:** Changes to C# source code. Focus on the addition (`+`) and removal (`-`) of code lines to understand logic changes.".to_string());
195-
output.push("* **\"Test*.cs\" Files:** Changes to unit test files. These are often important for understanding how the functionality is being tested and whether the changes are robust.".to_string());
196-
output.push("* **\".xml\" Files:** Changes to configuration or data files. Look for added, removed, or modified XML elements and attributes. Focus is usually on changes to properties.".to_string());
197-
output.push("".to_string());
198-
output.push("**5. Special Instructions for File Types based on the given filters:**".to_string());
199-
output.push("".to_string());
200-
output.push("* `.cs` code is assumed to not contain test code".to_string());
201-
output.push("* `*Test*.cs` contain test code, which should be helpful for understanding functionality.".to_string());
202-
output.push("* `*.xml` contains configuration.".to_string());
203-
output.push("".to_string());
204-
output.push("By focusing on these key elements, you can effectively extract meaningful information from Git diff output and summarize the changes made in a software project.".to_string());
205-
output.push("".to_string());
206-
output.push("---".to_string());
207-
output.push("".to_string());
212+
output.extend(Self::get_diff_instructions());
208213
}
209214

210215
for (filename, hunks) in patch_dict {

0 commit comments

Comments
 (0)