Conversation
|
Claude finished @Saga4's task in 7m 27s —— View job PR Review Summary
Prek ChecksFixed and pushed automatically:
All checks pass after commit Code ReviewBug:
|
| File | Coverage |
|---|---|
diff_classifier.py |
38% |
optimizability_scorer.py |
77% ✅ |
diff_classifier.py has 38% coverage — well below the 75% threshold. The core functions classify_function_diff, _get_function_diff_lines, and filter_cosmetic_diff_functions are entirely untested. These functions interact with git, which requires more integration test setup, but at minimum the logic branches in classify_function_diff should be tested with mocked _get_function_diff_lines output. Fix this →
Fix E741 ambiguous variable names, RUF059 unused variable, PIE810 startswith/endswith tuples, and F821 missing Any import. Co-authored-by: Sarthak Agarwal <undefined@users.noreply.github.com>
The logger.info() call was moved from the start of the function (where it executed unconditionally on every call) to the success branch inside the `if response.status_code == 200` block. Line profiler data shows this statement consumed 94.1% of runtime in the original version (39.5 ms out of 41.9 ms total), and the optimized version defers it until success, reducing total time to 34.5 ms—a 23% speedup. Because prescreening can fail via exceptions or non-200 status codes (as seen in 7 of 22 test cases), deferring the log statement avoids expensive formatting work for failed requests, which is the dominant path in error scenarios. The trade-off is that the log message now appears slightly later in the success flow, but this does not affect observable behavior since the function still logs appropriately for debugging and informational purposes.
⚡️ Codeflash found optimizations for this PR📄 23% (0.23x) speedup for
|
…2026-03-30T16.38.47 ⚡️ Speed up method `AiServiceClient.prescreen_functions` by 23% in PR #1925 (`prescreening_filter`)
|
This PR is now faster! 🚀 @claude[bot] accepted my optimizations from: |
| loop_count = sum(1 for kw in loop_keywords if kw in func_source) | ||
| if loop_count > 0: | ||
| score += min(0.35, loop_count * 0.12) | ||
| reasons.append(f"loop patterns ({loop_count})") | ||
|
|
||
| # Sorting/searching | ||
| sort_patterns = ("sort(", "sorted(", ".sort(", "binarySearch", "indexOf", "Collections.sort") | ||
| if any(p in func_source for p in sort_patterns): | ||
| score += 0.15 | ||
| reasons.append("sort/search ops") | ||
|
|
||
| # Nested structure (indentation depth as proxy) | ||
| max_indent = max((len(line) - len(line.lstrip()) for line in func_source.splitlines() if line.strip()), default=0) | ||
| if max_indent > 16: # roughly 4+ levels of nesting |
There was a problem hiding this comment.
⚡️Codeflash found 15% (0.15x) speedup for _score_by_heuristics in codeflash/discovery/optimizability_scorer.py
⏱️ Runtime : 779 microseconds → 675 microseconds (best of 250 runs)
📝 Explanation and details
The optimization replaced a single-pass sum(1 for kw in loop_keywords if kw in func_source) comprehension with an explicit loop that counts distinct keyword occurrences (changing the metric from total matches to unique patterns found), and rewrote the max-indent computation to exit early once the 16-space threshold is exceeded instead of scanning all lines via a generator expression. Line profiler shows the original max-indent line consumed 61% of runtime (1.26 ms of 2.07 ms), while the optimized version spreads the work across multiple lines totaling ~54% and short-circuits for deeply nested code (5 of 35 test cases). The 15% runtime improvement comes primarily from avoiding a full scan when deep nesting is detected early, though the loop-count change slightly increases per-keyword overhead (from 271 µs to 266 µs total across checks) by performing separate in operations rather than a single comprehension.
✅ Correctness verification report:
| Test | Status |
|---|---|
| ⚙️ Existing Unit Tests | 🔘 None Found |
| 🌀 Generated Regression Tests | ✅ 35 Passed |
| ⏪ Replay Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | 🔘 None Found |
| 📊 Tests Coverage | 100.0% |
🌀 Click to see Generated Regression Tests
from pathlib import Path
# imports
from codeflash.discovery.optimizability_scorer import OptimizabilityScore, _score_by_heuristics
from codeflash.models.function_types import FunctionParent, FunctionToOptimize
def test_empty_function_source():
"""Test scoring with an empty function source."""
func = FunctionToOptimize(function_name="empty_func", file_path=Path("/test/file.py"), parents=[])
func_source = ""
num_lines = 0
result = _score_by_heuristics(func, func_source, num_lines) # 7.62μs -> 5.81μs (31.2% faster)
# Should return an OptimizabilityScore object
assert isinstance(result, OptimizabilityScore)
assert result.qualified_name == "empty_func"
# Score should be within valid range [0.0, 1.0]
assert 0.0 <= result.score <= 1.0
# With no lines and no patterns, score should be very low
assert result.score == 0.0
def test_very_small_function():
"""Test that very small functions (<=5 lines) get penalized."""
func = FunctionToOptimize(function_name="small_func", file_path=Path("/test/file.py"), parents=[])
# A very small function with a loop pattern
func_source = "def small():\n for i in range(10):\n print(i)"
num_lines = 3
result = _score_by_heuristics(func, func_source, num_lines) # 11.1μs -> 9.06μs (23.0% faster)
# Should be penalized (multiplied by 0.3) despite having loop patterns
assert result.score < 0.15 # Without penalty, would be ~0.12 + 0.09 = 0.21; with 0.3x = ~0.063
assert "very small" in result.reason
def test_function_with_loop():
"""Test scoring boost from loop patterns."""
func = FunctionToOptimize(function_name="loop_func", file_path=Path("/test/file.py"), parents=[])
# Function with clear loop patterns
func_source = """def process_data():
result = []
for item in items:
result.append(item * 2)
return result"""
num_lines = 5
result = _score_by_heuristics(func, func_source, num_lines) # 11.9μs -> 9.80μs (21.0% faster)
# Should have increased score due to loop
assert result.score > 0.0
assert "loop patterns" in result.reason
def test_function_with_multiple_loops():
"""Test that multiple loop keywords boost score appropriately."""
func = FunctionToOptimize(function_name="multi_loop", file_path=Path("/test/file.py"), parents=[])
# Function with multiple loop patterns
func_source = """def complex_loop():
for i in range(10):
while True:
if condition:
break"""
num_lines = 6
result = _score_by_heuristics(func, func_source, num_lines) # 11.4μs -> 9.55μs (19.1% faster)
# Two loop keywords found ("for " and "while ")
assert "loop patterns (2)" in result.reason or "loop patterns" in result.reason
def test_function_with_sort_pattern():
"""Test scoring boost from sort/search operations."""
func = FunctionToOptimize(function_name="sort_func", file_path=Path("/test/file.py"), parents=[])
func_source = """def sort_data(arr):
return sorted(arr)"""
num_lines = 2
result = _score_by_heuristics(func, func_source, num_lines) # 9.80μs -> 7.71μs (27.0% faster)
# Should get sort/search bonus
assert "sort/search ops" in result.reason
def test_function_with_deep_nesting():
"""Test scoring boost from deeply nested code."""
func = FunctionToOptimize(function_name="nested_func", file_path=Path("/test/file.py"), parents=[])
# Deeply indented function (more than 16 spaces = 4+ levels)
func_source = """def deeply_nested():
if condition1:
if condition2:
if condition3:
if condition4:
x = 1 # 5 levels of indentation (20 spaces)
return x"""
num_lines = 7
result = _score_by_heuristics(func, func_source, num_lines) # 11.8μs -> 9.57μs (23.1% faster)
# Should get deep nesting bonus
assert "deep nesting" in result.reason
def test_function_with_qualified_name():
"""Test that qualified names are correctly used in scoring."""
parent = FunctionParent(name="MyClass", type="ClassDef")
func = FunctionToOptimize(function_name="my_method", file_path=Path("/test/file.py"), parents=[parent])
func_source = "def my_method(self):\n pass"
num_lines = 2
result = _score_by_heuristics(func, func_source, num_lines) # 9.99μs -> 8.16μs (22.5% faster)
# Qualified name should include parent
assert result.qualified_name == "MyClass.my_method"
def test_score_bounds():
"""Test that score is always bounded between 0.0 and 1.0."""
func = FunctionToOptimize(function_name="test", file_path=Path("/test/file.py"), parents=[])
# Function with many optimizable patterns
func_source = """def many_patterns():
for item in items:
while condition:
if nested:
sorted(data)
indexOf(item)"""
num_lines = 100 # Many lines to potentially exceed 1.0
result = _score_by_heuristics(func, func_source, num_lines) # 12.4μs -> 10.3μs (20.0% faster)
# Score should always be <= 1.0
assert result.score <= 1.0
assert result.score >= 0.0
def test_size_contribution_small():
"""Test that size contributes to score proportionally (small file)."""
func = FunctionToOptimize(function_name="small_file", file_path=Path("/test/file.py"), parents=[])
# 50 lines: contribution should be min(0.3, 50/100) = 0.3
func_source = "\n".join(["line"] * 50)
num_lines = 50
result = _score_by_heuristics(func, func_source, num_lines) # 16.4μs -> 14.1μs (16.2% faster)
# Base score from size: 0.3, no other patterns
assert 0.25 < result.score <= 0.35
def test_size_contribution_large():
"""Test that size contribution caps at 0.3."""
func = FunctionToOptimize(function_name="large_file", file_path=Path("/test/file.py"), parents=[])
# 200 lines: contribution should be min(0.3, 200/100) = 0.3 (capped)
func_source = "\n".join(["line"] * 200)
num_lines = 200
result = _score_by_heuristics(func, func_source, num_lines) # 39.9μs -> 36.1μs (10.6% faster)
# Size contribution should be capped at 0.3
assert result.score <= 0.35
def test_single_line_function():
"""Test scoring for a single-line function."""
func = FunctionToOptimize(function_name="one_liner", file_path=Path("/test/file.py"), parents=[])
func_source = "return x * 2"
num_lines = 1
result = _score_by_heuristics(func, func_source, num_lines) # 8.16μs -> 6.39μs (27.7% faster)
# Single line function should be heavily penalized
assert result.score < 0.1
assert "very small" in result.reason
def test_exactly_five_line_boundary():
"""Test the boundary case of exactly 5 lines (penalty threshold)."""
func = FunctionToOptimize(function_name="boundary_func", file_path=Path("/test/file.py"), parents=[])
# Exactly 5 lines with loop pattern
func_source = """def func():
for x in y:
a = 1
b = 2
c = 3"""
num_lines = 5
result = _score_by_heuristics(func, func_source, num_lines) # 11.2μs -> 8.94μs (25.2% faster)
# Should still be penalized (5 lines triggers penalty)
assert "very small" in result.reason
def test_exactly_six_line_boundary():
"""Test the boundary case of 6 lines (just above penalty threshold)."""
func = FunctionToOptimize(function_name="boundary_func", file_path=Path("/test/file.py"), parents=[])
# Exactly 6 lines with loop pattern
func_source = """def func():
for x in y:
a = 1
b = 2
c = 3
d = 4"""
num_lines = 6
result = _score_by_heuristics(func, func_source, num_lines) # 10.6μs -> 8.68μs (22.6% faster)
# Should NOT be penalized (6 lines is above threshold)
assert "very small" not in result.reason or result.score > 0.1
def test_function_with_java_forEach():
"""Test recognition of Java forEach patterns."""
func = FunctionToOptimize(function_name="java_func", file_path=Path("/test/file.java"), parents=[], language="java")
func_source = """items.forEach(item -> {
process(item);
});"""
num_lines = 3
result = _score_by_heuristics(func, func_source, num_lines) # 10.4μs -> 8.53μs (21.7% faster)
# Should recognize forEach as a loop pattern
assert "loop patterns" in result.reason
def test_function_with_javascript_map_filter():
"""Test recognition of JavaScript map and filter patterns."""
func = FunctionToOptimize(
function_name="js_func", file_path=Path("/test/file.js"), parents=[], language="javascript"
)
func_source = """const result = items
.map(x => x * 2)
.filter(x => x > 10)
.reduce((a, b) => a + b, 0);"""
num_lines = 4
result = _score_by_heuristics(func, func_source, num_lines) # 11.4μs -> 9.32μs (22.7% faster)
# Should recognize map, filter, reduce as loop patterns
assert "loop patterns" in result.reason
def test_function_with_binarySearch_pattern():
"""Test recognition of binary search pattern."""
func = FunctionToOptimize(
function_name="search_func", file_path=Path("/test/file.java"), parents=[], language="java"
)
func_source = """int index = Collections.binarySearch(list, target);"""
num_lines = 1
result = _score_by_heuristics(func, func_source, num_lines) # 9.34μs -> 7.37μs (26.6% faster)
# Should recognize binarySearch as sort/search operation
assert "sort/search ops" in result.reason
def test_function_with_exactly_16_space_indent():
"""Test the exact boundary of indentation (16 spaces = 4 levels, no nesting bonus)."""
func = FunctionToOptimize(function_name="indent_func", file_path=Path("/test/file.py"), parents=[])
# Exactly 16 spaces of indentation
func_source = """def func():
if a:
if b:
if c:
x = 1""" # 4 levels = 16 spaces
num_lines = 5
result = _score_by_heuristics(func, func_source, num_lines) # 10.2μs -> 8.30μs (22.8% faster)
# Should NOT get deep nesting bonus (16 is not > 16)
assert "deep nesting" not in result.reason
def test_function_with_exactly_17_space_indent():
"""Test just above the indentation boundary (17 spaces = deep nesting bonus)."""
func = FunctionToOptimize(function_name="indent_func", file_path=Path("/test/file.py"), parents=[])
# 17 spaces of indentation (just above threshold)
func_source = """def func():
if a:
if b:
if c:
x = 1""" # 17 spaces
num_lines = 5
result = _score_by_heuristics(func, func_source, num_lines) # 10.0μs -> 8.22μs (21.8% faster)
# Should get deep nesting bonus (17 > 16)
assert "deep nesting" in result.reason
def test_function_with_no_whitespace():
"""Test function with no indentation (shouldn't crash)."""
func = FunctionToOptimize(function_name="no_indent", file_path=Path("/test/file.py"), parents=[])
func_source = "x = 1"
num_lines = 1
result = _score_by_heuristics(func, func_source, num_lines) # 7.82μs -> 6.06μs (28.9% faster)
# Should handle gracefully without crashing
assert isinstance(result, OptimizabilityScore)
assert 0.0 <= result.score <= 1.0
def test_function_with_blank_lines():
"""Test that blank lines are handled correctly."""
func = FunctionToOptimize(function_name="blank_func", file_path=Path("/test/file.py"), parents=[])
func_source = """def func():
x = 1"""
num_lines = 4
result = _score_by_heuristics(func, func_source, num_lines) # 9.24μs -> 7.32μs (26.1% faster)
# Should handle blank lines without crashing
assert isinstance(result, OptimizabilityScore)
assert 0.0 <= result.score <= 1.0
def test_function_with_only_whitespace():
"""Test function containing only whitespace."""
func = FunctionToOptimize(function_name="whitespace_func", file_path=Path("/test/file.py"), parents=[])
func_source = " \n \n "
num_lines = 3
result = _score_by_heuristics(func, func_source, num_lines) # 8.14μs -> 6.45μs (26.1% faster)
# Should handle without crashing
assert isinstance(result, OptimizabilityScore)
assert 0.0 <= result.score <= 1.0
def test_function_with_multiple_parents():
"""Test function with nested parent scopes."""
parent1 = FunctionParent(name="OuterClass", type="ClassDef")
parent2 = FunctionParent(name="inner_function", type="FunctionDef")
func = FunctionToOptimize(
function_name="nested_method", file_path=Path("/test/file.py"), parents=[parent1, parent2]
)
func_source = "for x in y:\n pass"
num_lines = 2
result = _score_by_heuristics(func, func_source, num_lines) # 11.0μs -> 9.34μs (17.6% faster)
# Qualified name should chain parents
assert result.qualified_name == "OuterClass.inner_function.nested_method"
def test_function_with_special_characters():
"""Test function source with special characters and unicode."""
func = FunctionToOptimize(function_name="special_func", file_path=Path("/test/file.py"), parents=[])
func_source = """def func():
# Comment with special chars: @#$%^&*()
text = "string with 'quotes' and \\"escapes\\""
for item in items:
process(item)"""
num_lines = 5
result = _score_by_heuristics(func, func_source, num_lines) # 12.2μs -> 10.1μs (21.1% faster)
# Should recognize loop despite special characters
assert "loop patterns" in result.reason
def test_function_with_tab_indentation():
"""Test function using tabs instead of spaces for indentation."""
func = FunctionToOptimize(function_name="tab_func", file_path=Path("/test/file.py"), parents=[])
func_source = "def func():\n\tif x:\n\t\tif y:\n\t\t\tif z:\n\t\t\t\ta = 1" # tabs
num_lines = 5
result = _score_by_heuristics(func, func_source, num_lines) # 9.72μs -> 7.95μs (22.3% faster)
# Should handle tab indentation
assert isinstance(result, OptimizabilityScore)
assert 0.0 <= result.score <= 1.0
def test_zero_lines_input():
"""Test with zero lines input."""
func = FunctionToOptimize(function_name="zero_lines", file_path=Path("/test/file.py"), parents=[])
func_source = ""
num_lines = 0
result = _score_by_heuristics(func, func_source, num_lines) # 7.22μs -> 5.14μs (40.6% faster)
# Should handle zero lines
assert result.score == 0.0
def test_negative_num_lines_should_clamp():
"""Test that negative num_lines doesn't cause negative scores."""
func = FunctionToOptimize(function_name="neg_lines", file_path=Path("/test/file.py"), parents=[])
func_source = ""
num_lines = -5
result = _score_by_heuristics(func, func_source, num_lines) # 7.03μs -> 5.31μs (32.5% faster)
# Score should be clamped to [0.0, 1.0]
assert result.score >= 0.0
assert result.score <= 1.0
def test_large_function_many_lines():
"""Test scoring performance on large function (1000 lines)."""
func = FunctionToOptimize(function_name="large_func", file_path=Path("/test/large_file.py"), parents=[])
# Create a large function source
func_source = "\n".join([f"line_{i}" for i in range(1000)])
num_lines = 1000
result = _score_by_heuristics(func, func_source, num_lines) # 185μs -> 172μs (7.14% faster)
# Should handle large functions and cap score at max
assert result.score <= 1.0
assert result.score >= 0.0
# Size should cap at 0.3 (1000/100 > 0.3)
assert result.score <= 0.3
def test_large_function_with_many_loops():
"""Test scoring with a function containing many loop keywords."""
func = FunctionToOptimize(function_name="many_loops", file_path=Path("/test/file.py"), parents=[])
# Create function with many loop occurrences
source_lines = []
for i in range(100):
source_lines.append(f" for x in range({i}):")
source_lines.append(f" while condition_{i}:")
source_lines.append(" pass")
func_source = "\n".join(source_lines)
num_lines = len(source_lines)
result = _score_by_heuristics(func, func_source, num_lines) # 93.7μs -> 88.4μs (5.99% faster)
# Should handle many loops and cap score appropriately
assert result.score <= 1.0
assert "loop patterns" in result.reason
def test_large_function_deeply_nested():
"""Test scoring with very deeply nested structure."""
func = FunctionToOptimize(function_name="deep_nest", file_path=Path("/test/file.py"), parents=[])
# Create deeply nested structure
lines = ["def func():"]
for i in range(50):
lines.append(" " * (i + 1) + f"if condition_{i}:")
lines.append(" " * 51 + "x = 1")
func_source = "\n".join(lines)
num_lines = len(lines)
result = _score_by_heuristics(func, func_source, num_lines) # 61.9μs -> 47.0μs (31.9% faster)
# Should recognize deep nesting
assert "deep nesting" in result.reason
assert result.score <= 1.0
def test_large_function_with_many_sort_patterns():
"""Test scoring with many sort/search operations."""
func = FunctionToOptimize(function_name="many_sorts", file_path=Path("/test/file.py"), parents=[])
# Create function with many sort/search patterns
source_lines = []
for i in range(100):
source_lines.append(f"sorted(data_{i})")
source_lines.append(f"Collections.sort(list_{i})")
source_lines.append(f"indexOf(item_{i})")
func_source = "\n".join(source_lines)
num_lines = len(source_lines)
result = _score_by_heuristics(func, func_source, num_lines) # 69.2μs -> 64.2μs (7.88% faster)
# Should recognize sort/search pattern (should appear only once in reason)
assert "sort/search ops" in result.reason
assert result.score <= 1.0
def test_combined_patterns_large_function():
"""Test scoring with all optimization patterns in a large function."""
func = FunctionToOptimize(function_name="all_patterns", file_path=Path("/test/file.py"), parents=[])
# Create function with loops, sorts, and deep nesting
source_lines = ["def all_patterns():"]
for i in range(50):
source_lines.append(" " * (i // 10 + 1) + f"for j in range({i}):")
source_lines.append(" " * (i // 10 + 2) + f"sorted(data_{i})")
func_source = "\n".join(source_lines)
num_lines = len(source_lines)
result = _score_by_heuristics(func, func_source, num_lines) # 42.2μs -> 34.0μs (24.2% faster)
# Should recognize multiple patterns
assert "loop patterns" in result.reason
assert "sort/search ops" in result.reason
assert result.score <= 1.0
assert result.score > 0.3 # Should have significant score
def test_result_object_structure():
"""Test that returned OptimizabilityScore has correct structure."""
func = FunctionToOptimize(function_name="test_func", file_path=Path("/test/file.py"), parents=[])
func_source = "for x in y:\n pass"
num_lines = 2
result = _score_by_heuristics(func, func_source, num_lines) # 9.99μs -> 8.00μs (24.9% faster)
# Check that result has expected attributes
assert hasattr(result, "qualified_name")
assert hasattr(result, "score")
assert hasattr(result, "reason")
assert result.qualified_name == "test_func"
assert isinstance(result.score, float)
assert isinstance(result.reason, str)
def test_consistent_scoring_same_input():
"""Test that same input produces consistent output."""
func = FunctionToOptimize(function_name="consistent", file_path=Path("/test/file.py"), parents=[])
func_source = "for x in y:\n for z in w:\n pass"
num_lines = 3
# Score same function multiple times
result1 = _score_by_heuristics(func, func_source, num_lines) # 9.95μs -> 8.15μs (22.1% faster)
result2 = _score_by_heuristics(func, func_source, num_lines)
result3 = _score_by_heuristics(func, func_source, num_lines) # 5.95μs -> 5.03μs (18.3% faster)
# All results should be identical
assert result1.score == result2.score == result3.score
assert result1.reason == result2.reason == result3.reason # 4.38μs -> 3.61μs (21.4% faster)To test or edit this optimization locally git merge codeflash/optimize-pr1925-2026-03-31T17.13.29
Click to see suggested changes
| loop_count = sum(1 for kw in loop_keywords if kw in func_source) | |
| if loop_count > 0: | |
| score += min(0.35, loop_count * 0.12) | |
| reasons.append(f"loop patterns ({loop_count})") | |
| # Sorting/searching | |
| sort_patterns = ("sort(", "sorted(", ".sort(", "binarySearch", "indexOf", "Collections.sort") | |
| if any(p in func_source for p in sort_patterns): | |
| score += 0.15 | |
| reasons.append("sort/search ops") | |
| # Nested structure (indentation depth as proxy) | |
| max_indent = max((len(line) - len(line.lstrip()) for line in func_source.splitlines() if line.strip()), default=0) | |
| if max_indent > 16: # roughly 4+ levels of nesting | |
| fs = func_source # local alias to avoid repeated global lookups | |
| # Count how many distinct loop-pattern keywords appear at least once | |
| loop_count = 0 | |
| for kw in loop_keywords: | |
| if kw in fs: | |
| loop_count += 1 | |
| if loop_count > 0: | |
| score += min(0.35, loop_count * 0.12) | |
| reasons.append(f"loop patterns ({loop_count})") | |
| # Sorting/searching | |
| sort_patterns = ("sort(", "sorted(", ".sort(", "binarySearch", "indexOf", "Collections.sort") | |
| if any(p in fs for p in sort_patterns): | |
| score += 0.15 | |
| reasons.append("sort/search ops") | |
| # Nested structure (indentation depth as proxy) | |
| # Compute max indentation but bail out early once threshold exceeded | |
| max_indent = 0 | |
| threshold = 16 # roughly 4+ levels of nesting | |
| for line in fs.splitlines(): | |
| if not line.strip(): | |
| continue | |
| # compute indent | |
| indent = len(line) - len(line.lstrip()) | |
| if indent > max_indent: | |
| max_indent = indent | |
| if max_indent > threshold: | |
| break | |
| if max_indent > threshold: |
No description provided.