Skip to content

Commit 5eaf4fa

Browse files
feat: filter out common large files
1 parent 87a4ebf commit 5eaf4fa

14 files changed

Lines changed: 1173 additions & 1 deletion

File tree

.github/actions/auto-pr-description/README.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ A reusable GitHub Action that automatically generates pull request descriptions
88
- 🎯 **Smart Formatting**: Generates structured descriptions with Description, Changes, and Verification sections
99
- 🖼️ **Image Preservation**: Maintains existing images at the top of PR descriptions
1010
- 🎫 **JIRA Integration**: Automatically extracts JIRA ticket IDs and adds ticket links
11+
- 🚫 **Smart Filtering**: Automatically ignores large files like package-lock.json to prevent token size issues
1112
-**Fast & Lightweight**: Minimal dependencies and quick execution
1213

1314
## Usage
@@ -64,6 +65,7 @@ jobs:
6465
| `github-token` | GitHub token for PR operations | ✅ | - |
6566
| `pr-number` | Pull request number | ✅ | - |
6667
| `jira-ticket-url-prefix` | JIRA ticket URL prefix | ❌ | `https://virdocs.atlassian.net/browse/` |
68+
| `ignore-files` | Comma-separated list of files to ignore in diff | ❌ | `package-lock.json,yarn.lock,pnpm-lock.yaml,composer.lock,Gemfile.lock,poetry.lock,Pipfile.lock` |
6769

6870
## Outputs
6971

@@ -152,6 +154,28 @@ The action handles various error scenarios:
152154
# ... other inputs
153155
```
154156

157+
### Custom File Filtering
158+
By default, the action ignores common lock files that can be very large and don't provide meaningful context for PR descriptions. You can customize which files to ignore:
159+
160+
```yaml
161+
- uses: ./.github/actions/auto-pr-description
162+
with:
163+
ignore-files: 'package-lock.json,yarn.lock,dist/bundle.js,build/'
164+
# ... other inputs
165+
```
166+
167+
**Default ignored files:**
168+
- `package-lock.json` (npm)
169+
- `yarn.lock` (Yarn)
170+
- `pnpm-lock.yaml` (pnpm)
171+
- `composer.lock` (PHP Composer)
172+
- `Gemfile.lock` (Ruby Bundler)
173+
- `poetry.lock` (Python Poetry)
174+
- `Pipfile.lock` (Python Pipenv)
175+
176+
**Why filter files?**
177+
Large files like lock files can cause the AI API to hit token limits, resulting in failed PR description generation. By filtering these files, the action focuses on meaningful code changes while staying within API limits.
178+
155179
### Using Outputs
156180
```yaml
157181
- name: Generate PR Description
@@ -172,7 +196,7 @@ The action handles various error scenarios:
172196

173197
1. **Missing API Key**: Ensure `GEMINI_API_KEY` is set in repository secrets
174198
2. **Permission Denied**: Check that workflow has `pull-requests: write` permission
175-
3. **Large Diffs**: Very large changes might exceed API limits - consider smaller PRs
199+
3. **Large Diffs**: Very large changes might exceed API limits - use `ignore-files` to filter out large files or consider smaller PRs
176200
4. **Rate Limits**: Gemini API has rate limits - add delays between calls if needed
177201
5. **Invalid PR Number**: Ensure the PR number is valid and accessible
178202

.github/actions/auto-pr-description/action.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@ inputs:
1414
description: 'JIRA ticket URL prefix (e.g., https://company.atlassian.net/browse/)'
1515
required: false
1616
default: 'https://virdocs.atlassian.net/browse/'
17+
ignore-files:
18+
description: 'Comma-separated list of files to ignore in the diff (e.g., package-lock.json,yarn.lock)'
19+
required: false
20+
default: 'package-lock.json,yarn.lock,pnpm-lock.yaml,composer.lock,Gemfile.lock,poetry.lock,Pipfile.lock'
1721
outputs:
1822
description:
1923
description: 'The generated PR description'
@@ -41,10 +45,18 @@ runs:
4145
env:
4246
GH_TOKEN: ${{ inputs.github-token }}
4347
PR_NUMBER: ${{ inputs.pr-number }}
48+
IGNORE_FILES: ${{ inputs.ignore-files }}
4449
run: |
4550
# Get the PR diff directly from GitHub using gh CLI
4651
gh pr diff ${{ inputs.pr-number }} > pr.diff
4752
echo "Generated diff file with $(wc -l < pr.diff) lines"
53+
54+
# Filter out ignored files from the diff
55+
if [ -n "$IGNORE_FILES" ]; then
56+
echo "Filtering out ignored files: $IGNORE_FILES"
57+
node ${{ github.action_path }}/filter_diff.js pr.diff "$IGNORE_FILES"
58+
echo "Filtered diff file now has $(wc -l < pr.diff) lines"
59+
fi
4860
4961
- name: Generate PR description
5062
id: generate_description
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
#!/usr/bin/env node
2+
3+
const fs = require('fs');
4+
5+
/**
6+
* Filters out specified files from a git diff
7+
* Usage: node filter_diff.js <diff_file> <ignore_files_comma_separated>
8+
*/
9+
10+
function filterDiff(diffContent, ignoreFiles) {
11+
if (!ignoreFiles || ignoreFiles.trim() === '') {
12+
return diffContent;
13+
}
14+
15+
const filesToIgnore = ignoreFiles.split(',').map(f => f.trim()).filter(f => f.length > 0);
16+
if (filesToIgnore.length === 0) {
17+
return diffContent;
18+
}
19+
20+
const lines = diffContent.split('\n');
21+
const filteredLines = [];
22+
let currentFile = null;
23+
let skipCurrentFile = false;
24+
25+
for (let i = 0; i < lines.length; i++) {
26+
const line = lines[i];
27+
28+
// Check if this is a file header line (starts with diff --git)
29+
if (line.startsWith('diff --git ')) {
30+
// Extract the file path from the diff header
31+
// Format: diff --git a/path/to/file b/path/to/file
32+
const match = line.match(/diff --git a\/(.+?) b\/(.+?)$/);
33+
if (match) {
34+
currentFile = match[1]; // Use the 'a/' path
35+
36+
// Check if this file should be ignored
37+
skipCurrentFile = filesToIgnore.some(ignoreFile => {
38+
// Support both exact matches and basename matches
39+
return currentFile === ignoreFile || currentFile.endsWith('/' + ignoreFile) || currentFile === ignoreFile;
40+
});
41+
42+
if (skipCurrentFile) {
43+
// Skip all lines until the next file or end of diff
44+
while (i + 1 < lines.length && !lines[i + 1].startsWith('diff --git ')) {
45+
i++;
46+
}
47+
continue;
48+
}
49+
}
50+
}
51+
52+
// If we're not skipping the current file, include the line
53+
if (!skipCurrentFile) {
54+
filteredLines.push(line);
55+
}
56+
}
57+
58+
return filteredLines.join('\n');
59+
}
60+
61+
// Main execution
62+
if (require.main === module) {
63+
const [, , diffFile, ignoreFiles] = process.argv;
64+
65+
if (!diffFile) {
66+
console.error('Usage: filter_diff.js <diff_file> <ignore_files_comma_separated>');
67+
process.exit(1);
68+
}
69+
70+
if (!fs.existsSync(diffFile)) {
71+
console.error(`Error: Diff file not found at ${diffFile}`);
72+
process.exit(1);
73+
}
74+
75+
try {
76+
const diffContent = fs.readFileSync(diffFile, 'utf8');
77+
const filteredDiff = filterDiff(diffContent, ignoreFiles || '');
78+
79+
// Write filtered diff back to the same file
80+
fs.writeFileSync(diffFile, filteredDiff, 'utf8');
81+
82+
// Output statistics
83+
const originalLines = diffContent.split('\n').length;
84+
const filteredLines = filteredDiff.split('\n').length;
85+
const removedLines = originalLines - filteredLines;
86+
87+
console.error(`Filtered diff: removed ${removedLines} lines (${originalLines} -> ${filteredLines})`);
88+
if (ignoreFiles) {
89+
console.error(`Ignored files: ${ignoreFiles}`);
90+
}
91+
} catch (error) {
92+
console.error(`Error filtering diff: ${error.message}`);
93+
process.exit(1);
94+
}
95+
}
96+
97+
module.exports = { filterDiff };
Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
# Auto PR Description Generator
2+
3+
A reusable GitHub Action that automatically generates pull request descriptions using AI (Google Gemini) based on the git diff of changes.
4+
5+
## Features
6+
7+
- 🤖 **AI-Powered**: Uses Google Gemini to analyze code changes and generate meaningful descriptions
8+
- 🎯 **Smart Formatting**: Generates structured descriptions with Description, Changes, and Verification sections
9+
- 🖼️ **Image Preservation**: Maintains existing images at the top of PR descriptions
10+
- 🎫 **JIRA Integration**: Automatically extracts JIRA ticket IDs and adds ticket links
11+
- 🚫 **Smart Filtering**: Automatically ignores large files like package-lock.json to prevent token size issues
12+
-**Fast & Lightweight**: Minimal dependencies and quick execution
13+
14+
## Usage
15+
16+
### Basic Usage
17+
18+
```yaml
19+
- name: Generate PR Description
20+
uses: ./.github/actions/auto-pr-description
21+
with:
22+
gemini-api-key: ${{ secrets.GEMINI_API_KEY }}
23+
github-token: ${{ secrets.GITHUB_TOKEN }}
24+
pr-number: ${{ github.event.pull_request.number }}
25+
```
26+
27+
### Complete Workflow Example
28+
29+
```yaml
30+
name: Auto PR Description
31+
on:
32+
pull_request:
33+
types: [labeled]
34+
35+
jobs:
36+
update-pr-description:
37+
name: Update PR Description
38+
runs-on: ubuntu-latest
39+
if: |
40+
github.event_name == 'pull_request' &&
41+
github.base_ref == 'main' &&
42+
(github.event.pull_request.draft == false || github.event.action == 'labeled') &&
43+
(contains(github.event.pull_request.labels.*.name, 'auto-pr-description') ||
44+
contains(github.event.pull_request.labels.*.name, 'test'))
45+
permissions:
46+
pull-requests: write
47+
contents: read
48+
steps:
49+
- uses: actions/checkout@v4
50+
51+
- name: Generate PR Description
52+
uses: ./.github/actions/auto-pr-description
53+
with:
54+
gemini-api-key: ${{ secrets.GEMINI_API_KEY }}
55+
github-token: ${{ secrets.GITHUB_TOKEN }}
56+
pr-number: ${{ github.event.pull_request.number }}
57+
jira-ticket-url-prefix: 'https://yourcompany.atlassian.net/browse/'
58+
```
59+
60+
## Inputs
61+
62+
| Input | Description | Required | Default |
63+
|-------|-------------|----------|---------|
64+
| `gemini-api-key` | The API key for the Gemini API | ✅ | - |
65+
| `github-token` | GitHub token for PR operations | ✅ | - |
66+
| `pr-number` | Pull request number | ✅ | - |
67+
| `jira-ticket-url-prefix` | JIRA ticket URL prefix | ❌ | `https://virdocs.atlassian.net/browse/` |
68+
| `ignore-files` | Comma-separated list of files to ignore in diff | ❌ | `package-lock.json,yarn.lock,pnpm-lock.yaml,composer.lock,Gemfile.lock,poetry.lock,Pipfile.lock` |
69+
70+
## Outputs
71+
72+
| Output | Description |
73+
|--------|-------------|
74+
| `description` | The generated PR description |
75+
| `updated` | Whether the PR description was updated |
76+
77+
## Generated Description Format
78+
79+
The action generates PR descriptions in this structured format:
80+
81+
```markdown
82+
## Description
83+
A concise summary of what the changes accomplish.
84+
85+
## Changes
86+
- [ ] Specific change or feature added
87+
- [ ] Another modification made
88+
- [ ] Bug fix or improvement
89+
90+
## Verification
91+
- [ ] Test that should be performed
92+
- [ ] Verification step to confirm functionality
93+
- [ ] Additional checks recommended
94+
95+
## Ticket
96+
https://yourcompany.atlassian.net/browse/TICKET-123
97+
```
98+
99+
## JIRA Integration
100+
101+
The action automatically detects JIRA ticket IDs from:
102+
1. **PR Title**: Extracts patterns like `CORE-1234`, `PAR-567`, etc.
103+
2. **Branch Name**: Falls back to branch name if not found in title
104+
105+
Example branch names that work:
106+
- `CORE-1234-feature-description`
107+
- `PAR-567-bug-fix`
108+
- `feature/CORE-1234-new-feature`
109+
110+
## Prerequisites
111+
112+
### Required Secrets
113+
114+
1. **GEMINI_API_KEY**: Get your API key from [Google AI Studio](https://makersuite.google.com/app/apikey)
115+
2. **GITHUB_TOKEN**: Automatically provided by GitHub Actions
116+
117+
### Required Permissions
118+
119+
The workflow must have these permissions:
120+
```yaml
121+
permissions:
122+
pull-requests: write
123+
contents: read
124+
```
125+
126+
## Trigger Patterns
127+
128+
### Label-Based Triggering
129+
Add these labels to trigger the action:
130+
- `auto-pr-description`: Specific label for PR description generation
131+
- `test`: Dual-purpose label that can trigger both testing and description generation
132+
133+
### Draft Mode Handling
134+
- **Draft PRs**: Action doesn't run automatically to save CI resources
135+
- **Label Override**: Adding trigger labels to draft PRs will run the action
136+
- **Ready for Review**: Converting draft to ready automatically triggers the action
137+
138+
## Error Handling
139+
140+
The action handles various error scenarios:
141+
- Missing or invalid Gemini API key
142+
- API rate limits and timeouts
143+
- Large diffs that exceed API limits
144+
- Network connectivity issues
145+
- Invalid PR numbers
146+
147+
## Customization
148+
149+
### Custom JIRA URL
150+
```yaml
151+
- uses: ./.github/actions/auto-pr-description
152+
with:
153+
jira-ticket-url-prefix: 'https://mycompany.atlassian.net/browse/'
154+
# ... other inputs
155+
```
156+
157+
### Custom File Filtering
158+
By default, the action ignores common lock files that can be very large and don't provide meaningful context for PR descriptions. You can customize which files to ignore:
159+
160+
```yaml
161+
- uses: ./.github/actions/auto-pr-description
162+
with:
163+
ignore-files: 'package-lock.json,yarn.lock,dist/bundle.js,build/'
164+
# ... other inputs
165+
```
166+
167+
**Default ignored files:**
168+
- `package-lock.json` (npm)
169+
- `yarn.lock` (Yarn)
170+
- `pnpm-lock.yaml` (pnpm)
171+
- `composer.lock` (PHP Composer)
172+
- `Gemfile.lock` (Ruby Bundler)
173+
- `poetry.lock` (Python Poetry)
174+
- `Pipfile.lock` (Python Pipenv)
175+
176+
**Why filter files?**
177+
Large files like lock files can cause the AI API to hit token limits, resulting in failed PR description generation. By filtering these files, the action focuses on meaningful code changes while staying within API limits.
178+
179+
### Using Outputs
180+
```yaml
181+
- name: Generate PR Description
182+
id: pr-desc
183+
uses: ./.github/actions/auto-pr-description
184+
with:
185+
# ... inputs
186+
187+
- name: Use generated description
188+
run: |
189+
echo "Generated description: ${{ steps.pr-desc.outputs.description }}"
190+
echo "Was updated: ${{ steps.pr-desc.outputs.updated }}"
191+
```
192+
193+
## Troubleshooting
194+
195+
### Common Issues
196+
197+
1. **Missing API Key**: Ensure `GEMINI_API_KEY` is set in repository secrets
198+
2. **Permission Denied**: Check that workflow has `pull-requests: write` permission
199+
3. **Large Diffs**: Very large changes might exceed API limits - use `ignore-files` to filter out large files or consider smaller PRs
200+
4. **Rate Limits**: Gemini API has rate limits - add delays between calls if needed
201+
5. **Invalid PR Number**: Ensure the PR number is valid and accessible
202+
203+
### Debug Mode
204+
205+
Enable debug logging by setting:
206+
```yaml
207+
env:
208+
ACTIONS_STEP_DEBUG: true
209+
```
210+
211+
## License
212+
213+
MIT License - see LICENSE file for details.

0 commit comments

Comments
 (0)