You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/workflows/claude.yml
+2Lines changed: 2 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -98,6 +98,8 @@ jobs:
98
98
2. Security vulnerabilities
99
99
3. Breaking API changes
100
100
4. Test failures (methods with typos that wont run)
101
+
5. Stale documentation — if files or directories were moved, renamed, or deleted, check that `.claude/rules/`, `CLAUDE.md`, and `AGENTS.md` don't reference paths that no longer exist
102
+
6. New language support — if new language modules are added under `languages/`, check that `.github/workflows/duplicate-code-detector.yml` includes the new language in its file filters, search patterns, and cross-module checks
101
103
102
104
IMPORTANT:
103
105
- First check existing review comments using `gh api repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/comments`. For each existing comment, check if the issue still exists in the current code.
You are a duplicate code detector for a multi-language codebase (Python, JavaScript, TypeScript, Java). Check whether this PR introduces code that duplicates logic already present elsewhere in the repository — including across languages. Focus on finding true duplicates, not just similar-looking code.
60
61
61
-
## Setup
62
+
## Changed files
62
63
63
-
First activate the project in Serena:
64
-
- Use `mcp__serena__activate_project` with the workspace path `${{ github.workspace }}`
64
+
```
65
+
${{ steps.changed-files.outputs.files }}
66
+
```
65
67
66
68
## Steps
67
69
68
-
1. Get the list of changed .py files (excluding tests):
1. **Read changed files.** For each file above, read it and identify functions or methods that were added or substantially modified (longer than 5 lines).
71
+
72
+
2. **Search for duplicates.** For each function, use Grep to search the codebase for:
73
+
- The same function name defined elsewhere (`def function_name` for Python, `function function_name` / `const function_name` / `module.exports` for the JS files under `packages/`)
74
+
- 2-3 distinctive operations from the body (specific API calls, algorithm patterns, string literals, exception types) — this catches duplicates that have different names but implement the same logic
75
+
76
+
3. **Cross-module check.** This codebase has parallel Python modules under `languages/python/`, `languages/javascript/`, and `languages/java/` that handle the same concerns (parsing, code replacement, test running, etc.) for different target languages. It also has a JS runtime under `packages/codeflash/runtime/` and a Java runtime under `codeflash-java-runtime/`. When a changed file is under one of these areas, also search the others for equivalent logic. For example:
77
+
- `languages/javascript/code_replacer.py` and `languages/python/static_analysis/code_replacer.py` both handle code replacement — shared logic should be extracted
78
+
- Shared concepts (AST traversal, scope analysis, import resolution, test running) are prime candidates for duplication across these modules
79
+
80
+
4. **Compare candidates.** When a Grep hit looks promising (not just a shared import or call site), read the full function and compare semantics. Flag it only if it matches one of these patterns:
81
+
- **Same function in two modules** — a function with the same or very similar body exists in another module. One should import from the other instead (within the same language).
82
+
- **Shared logic across sibling files** — the same helper logic repeated in files within the same package. Should be extracted to a common module.
83
+
- **Repeated pattern across classes** — multiple classes implement the same logic inline (e.g., identical traversal, identical validation). Should be a mixin or shared helper.
84
+
- **Cross-module reimplementation** — the same algorithm or utility implemented in both `languages/python/` and `languages/javascript/` (both are Python) or between Python orchestration code and JS runtime code in `packages/`. Note: some duplication is unavoidable (each target language needs its own parser, for example). Only flag cases where the logic is genuinely shared or where one module could import from the other.
85
+
86
+
5. **Report findings.** Post a single PR comment. Report at most 5 findings.
87
+
88
+
**If duplicates found**, for each one:
89
+
- **Confidence**: HIGH (identical or near-identical logic) / MEDIUM (same intent, minor differences worth reviewing)
90
+
- **Locations**: `file_path:line_number` for both the new and existing code
91
+
- **What's duplicated**: One sentence describing the shared logic
92
+
- **Suggestion**: How to consolidate — import from canonical location, extract to shared module, create a mixin. For cross-module duplicates (between language directories or Python↔JS runtime), just flag it for a tech lead to review rather than prescribing a specific fix.
93
+
94
+
**If no duplicates found**, post a comment that just says "No duplicates detected." so the sticky comment gets updated.
95
+
96
+
## Examples (illustrative — these are past cases, some already resolved)
97
+
98
+
**IS a duplicate (HIGH):** A 12-line `is_build_output_dir()` function was defined identically in two modules (`setup/detector.py` and `code_utils/config_js.py`). Fix: delete one, import from the other.
99
+
100
+
**IS a duplicate (MEDIUM):** `is_assignment_used()` was implemented separately in two context files with the same logic. Fix: move to a shared module, import from both call sites.
101
+
102
+
**IS a duplicate (MEDIUM, cross-module):** `normalize_path()` implemented in both `languages/python/support.py` and `languages/javascript/support.py` with identical logic. Flagging for tech lead review — should likely be extracted to `languages/base.py` or a shared utility.
103
+
104
+
**NOT a duplicate:** Two classes each define a `visit()` method that traverses an AST, but they handle different node types and produce different outputs. This is intentional polymorphism.
105
+
106
+
**NOT a duplicate (cross-module):** `languages/python/static_analysis/code_extractor.py` and `languages/javascript/parse.py` both extract functions from source code, but they use fundamentally different parsing strategies (Python AST vs tree-sitter). The logic is necessarily different.
Copy file name to clipboardExpand all lines: CLAUDE.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
## Project Overview
4
4
5
-
CodeFlash is an AI-powered Python code optimizer that automatically improves code performance while maintaining correctness. It uses LLMs to generate optimization candidates, verifies correctness through test execution, and benchmarks performance improvements.
5
+
CodeFlash is an AI-powered code optimizer that automatically improves performance while maintaining correctness. It supports Python, JavaScript, and TypeScript, with more languages planned. It uses LLMs to generate optimization candidates, verifies correctness through test execution, and benchmarks performance improvements.
6
6
7
7
## Optimization Pipeline
8
8
@@ -12,7 +12,7 @@ Discovery → Ranking → Context Extraction → Test Gen + Optimization → Bas
12
12
13
13
1.**Discovery** (`discovery/`): Find optimizable functions across the codebase
14
14
2.**Ranking** (`benchmarking/function_ranker.py`): Rank functions by addressable time using trace data
0 commit comments