Skip to content
This repository was archived by the owner on Mar 27, 2026. It is now read-only.

Commit 98b558c

Browse files
committed
spike
1 parent 54d06d6 commit 98b558c

2 files changed

Lines changed: 114 additions & 2 deletions

File tree

README.md

Lines changed: 50 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,50 @@
1-
# GitDiff4Review
2-
GitDiff4Review is a tool designed to simplify code reviews by generating comprehensive git diffs between two commits, branches, or pull requests. It combines diffs into a single output file that is optimized for consumption by large language models (LLMs) or for manual code review purposes.
1+
2+
# GitDiff4LLM
3+
4+
**GitDiff4LLM** is a tool designed to simplify code reviews by generating comprehensive git diffs between two commits, branches, or pull requests. It combines diffs into a single output file optimized for consumption by large language models (LLMs) or manual review.
5+
6+
## Features
7+
8+
- Generate diffs between two commits or branches with customizable options.
9+
- Supports both non-test files and test files with distinct diff options.
10+
- Automatically retrieves diffs for base and target branches from pull requests.
11+
- Combines diffs into a single file
12+
- Calculates token counts for use in LLMs or analytics.
13+
14+
## Usage
15+
16+
To use GitDiff4LLM, run the following command:
17+
18+
```bash
19+
python gitdiff4llm.py <commit1> <commit2> /path/to/output_file.txt
20+
```
21+
22+
### Example
23+
24+
```bash
25+
python gitdiff4llm.py abc123 def456 combined_diff.txt
26+
```
27+
28+
## Download
29+
30+
You can download the latest version of **GitDiff4LLM** [here](https://github.com/EntityProcess/GitDiff4LLM/releases).
31+
32+
## Prerequisites
33+
34+
- **PowerShell (Windows Only)**: If you're using Windows, you need to run the script in PowerShell. The pattern matching functionality in the script will not work properly in Command Prompt (`cmd`).
35+
- **Python 3.x**: Ensure Python is installed on your system.
36+
37+
## Installation
38+
39+
Clone the repository and navigate to the directory:
40+
41+
```bash
42+
git clone https://github.com/EntityProcess/GitDiff4LLM.git
43+
cd GitDiff4LLM
44+
```
45+
46+
Make sure you have Python installed on your system, then you can run the script as outlined in the usage section.
47+
48+
## Contributing
49+
50+
Contributions are welcome! Please feel free to submit a pull request or open an issue for any bugs or feature requests.

gitdiff4llm.py

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
import subprocess
2+
import sys
3+
import os
4+
import tiktoken
5+
6+
# Tokenizer function using OpenAI's tiktoken for LLMs (GPT-3/4)
7+
def count_tokens(text, model="gpt-4o"):
8+
encoding = tiktoken.encoding_for_model(model)
9+
return len(encoding.encode(text))
10+
11+
# Function to execute the git diff command and return the result
12+
def run_git_diff(commit1, commit2, diff_options):
13+
try:
14+
result = subprocess.run(
15+
["git", "diff", commit1, commit2] + diff_options,
16+
capture_output=True, text=True, check=True, encoding='utf-8', errors='replace'
17+
)
18+
return result.stdout
19+
except subprocess.CalledProcessError as e:
20+
print(f"Error running git diff: {e}")
21+
sys.exit(1)
22+
23+
# Main function to generate the combined diff and calculate token count
24+
def main(commit1, commit2, output_file):
25+
# Run git diff with the first set of options
26+
diff1 = run_git_diff(commit1, commit2, ["-U100", "--ignore-all-space", "--", ":!*Test*"])
27+
28+
# Run git diff with the second set of options for test files
29+
diff2 = run_git_diff(commit1, commit2, ["-U20", "--ignore-all-space", "--", "*Test*"])
30+
31+
# Ensure both diffs are valid strings
32+
if diff1 is None:
33+
diff1 = ""
34+
if diff2 is None:
35+
diff2 = ""
36+
37+
# Combine the two diffs
38+
combined_diff = diff1 + "\n" + diff2
39+
40+
# Write the combined diff to the output file
41+
with open(output_file, 'w', encoding='utf-8') as f:
42+
f.write(combined_diff)
43+
44+
# Calculate token count using LLM tokenizer
45+
token_count = count_tokens(combined_diff)
46+
47+
# Output results
48+
print(f"Combined diff written to {output_file}")
49+
print(f"Total number of tokens: {token_count}")
50+
51+
# Entry point of the script
52+
if __name__ == "__main__":
53+
if len(sys.argv) != 4:
54+
print("Usage: python gitdiff4review.py <commit1> <commit2> <output_file>")
55+
sys.exit(1)
56+
57+
commit1 = sys.argv[1]
58+
commit2 = sys.argv[2]
59+
output_file = sys.argv[3]
60+
61+
# Make sure the output directory exists
62+
os.makedirs(os.path.dirname(output_file), exist_ok=True)
63+
64+
main(commit1, commit2, output_file)

0 commit comments

Comments
 (0)