【WIP】feat: add ocr scan for full-file code review#93
Conversation
Introduce a new top-level subcommand `ocr scan` (alias `s`) that reviews
whole files instead of git diffs. Use cases include reviewing unfamiliar
codebases, pre-migration audits, and ad-hoc per-directory reviews.
Architecture splits scan and diff review at the package level so the two
pipelines can evolve independently:
- internal/scan/ new package: file enumeration via `git ls-files`,
full-scan agent, FULL_SCAN_TASK rendering, preview
- internal/llmloop/ new package: shared LLM tool-use loop, three-zone
memory compression, CommentWorkerPool, AgentWarning.
Both internal/agent and internal/scan delegate to
llmloop.Runner; agent and scan never import each other
- internal/agent/ slimmed: LLM loop / compression / token aggregation
moved to llmloop; review-only orchestration remains
- internal/model/ new ScanItem (full-file payload) + Preview /
PreviewEntry / ExcludeReason shared by both modes
- internal/diff/ new gitignore.go exporting helpers reused by scan
- cmd/opencodereview/ new scan_cmd.go; shared.go consolidates startup
(loadCommonContext / loadLLMRuntime), output
(emitRunResult, ResultProvider) and stdout silencing
(quietHandle); review_cmd.go follows the same shape
Template additions:
- FULL_SCAN_TASK: dedicated prompt with Tool-call discipline guidance to
reduce gratuitous tool calls per file
- FULL_SCAN_MAX_TOOL_REQUEST_TIMES (default 60): scan-only per-file budget,
raised over diff's 30 to fit multi-finding files; --max-tools still
composes (only raise, never lower)
In scan mode, file_read_diff is filtered out of MainToolDefs since it has
no useful semantics without a diff.
Tests cover provider enumeration (with temp git repo), template rendering,
filter passes, dependency budget, flag validation, and excludeToolDef.
| func IsPathExcluded(repoDir, relPath string, patterns []string) bool { | ||
| stub := &Provider{repoDir: repoDir} | ||
| return stub.isPathExcluded(relPath, patterns) | ||
| } |
There was a problem hiding this comment.
The repoDir parameter is accepted but never used — isPathExcluded only reads the package-level providerDirIgnoreDirs and the passed-in patterns; it does not access p.repoDir. This is misleading to callers (e.g., scan/provider.go passes p.repoDir expecting it to matter). Either remove the parameter or document why it's reserved for future use. If kept for forward-compatibility, consider adding a comment explaining it's currently unused.
| // parseReviewFlags already wraps with "parse flags: %w" — return as-is. | ||
| return err |
There was a problem hiding this comment.
Bug: Incorrect comment / inconsistent error wrapping. The comment claims parseReviewFlags already wraps all errors with "parse flags: %w", but that's only true for the a.Parse(args) error (flags.go:136). Validation errors returned later in parseReviewFlags (e.g., "only one review mode allowed", "--to is required when --from is specified", "invalid --audience value") are returned without the "parse flags:" prefix.
The old code wrapped all errors from parseReviewFlags uniformly with fmt.Errorf("parse flags: %w", err), providing consistent context. Removing this wrapper means those validation errors now lose the "parse flags:" prefix, creating inconsistent error messages.
Either restore the wrapper here (it's harmless to double-wrap the parse error), or add the prefix to every error return in parseReviewFlags.
|
|
||
| go func() { | ||
| defer cancel() | ||
| rebuilt, _ := r.runCompression(asyncCtx, msgSnapshot, filePath) |
There was a problem hiding this comment.
Bug: Silent data loss on async compression failure. The error from runCompression is discarded here (rebuilt, _). When runCompression fails, it returns msgs[:part.frozenEnd] — a truncated message list containing only the first 2 messages. This truncated result is then stored in job.rebuilt and later applied by tryApplyPendingCompression as if it were a valid compression result, silently discarding all active-zone messages.
Suggestion: Either skip setting job.rebuilt when the error is non-nil, or store the error and handle it in tryApplyPendingCompression.
Suggestion:
| rebuilt, _ := r.runCompression(asyncCtx, msgSnapshot, filePath) | |
| rebuilt, err := r.runCompression(asyncCtx, msgSnapshot, filePath) | |
| if err != nil { | |
| // Don't apply a failed/truncated compression result. | |
| r.compressionMu.Lock() | |
| if r.pendingJob == job { | |
| r.pendingJob = nil | |
| } | |
| r.compressionMu.Unlock() | |
| return | |
| } |
| if err != nil { | ||
| rec.SetError(err, duration) | ||
| fmt.Fprintf(stdout.Writer(), "[ocr] Memory compression failed: %v\n", err) | ||
| return msgs[:part.frozenEnd], fmt.Errorf("memory compression: %w", err) | ||
| } |
There was a problem hiding this comment.
Bug: Data loss on synchronous compression failure. When runCompression returns an error, it also returns msgs[:part.frozenEnd] (only the first 2 messages). The caller in addNextMessage discards the error (*messages, _ = r.runCompression(...)) and replaces the full message list with this truncated version. This means a transient LLM failure during compression permanently destroys the entire conversation context beyond the frozen zone, including the active zone that was intentionally preserved.
On error, the function should return the original msgs unchanged so the caller retains the full conversation history.
Suggestion:
| if err != nil { | |
| rec.SetError(err, duration) | |
| fmt.Fprintf(stdout.Writer(), "[ocr] Memory compression failed: %v\n", err) | |
| return msgs[:part.frozenEnd], fmt.Errorf("memory compression: %w", err) | |
| } | |
| if err != nil { | |
| rec.SetError(err, duration) | |
| fmt.Fprintf(stdout.Writer(), "[ocr] Memory compression failed: %v\n", err) | |
| return msgs, fmt.Errorf("memory compression: %w", err) | |
| } |
| for i, m := range msgs { | ||
| sb.WriteString(fmt.Sprintf("<message id=\"%d\" role=\"%s\">\n", i, m.Role)) | ||
| sb.WriteString(" <content>\n") | ||
| sb.WriteString(fmt.Sprintf(" %s\n", m.ExtractText())) |
There was a problem hiding this comment.
Bug: XML injection / malformed XML. Message content is interpolated directly into XML without escaping special characters (<, >, &, "). In a code review context, messages frequently contain code snippets with these characters, which will produce malformed XML and potentially confuse the compression LLM prompt.
Use encoding/xml.EscapeText or html.EscapeString to escape the content before embedding it.
Suggestion:
| sb.WriteString(fmt.Sprintf(" %s\n", m.ExtractText())) | |
| sb.WriteString(fmt.Sprintf(" %s\n", html.EscapeString(m.ExtractText()))) |
| if len(calls) == 0 { | ||
| fmt.Fprintf(stdout.Writer(), "[ocr] No tool calls parsed for %s, retrying...\n", newPath) | ||
| messages = append(messages, llm.NewTextMessage("user", "You did not successfully call any tools. Please try again or use task_done if finished.")) | ||
| if content != "" { | ||
| messages = append(messages[:len(messages)-1], llm.NewTextMessage("assistant", content), messages[len(messages)-1]) | ||
| } | ||
| continue | ||
| } |
There was a problem hiding this comment.
Missing consecutiveEmptyRounds tracking for no-tool-call retries: When the LLM returns no tool calls (len(calls) == 0), the code appends a retry message and continues, but consecutiveEmptyRounds is never incremented in this path. The counter only tracks rounds where tool calls were made but returned no valid data. If the model persistently outputs text without calling any tools, the loop will consume all toolReqCount iterations (each making an expensive LLM API call) before stopping.
Consider incrementing consecutiveEmptyRounds here as well, or adding a separate counter for no-tool-call retries, so the safety break applies uniformly.
Suggestion:
| if len(calls) == 0 { | |
| fmt.Fprintf(stdout.Writer(), "[ocr] No tool calls parsed for %s, retrying...\n", newPath) | |
| messages = append(messages, llm.NewTextMessage("user", "You did not successfully call any tools. Please try again or use task_done if finished.")) | |
| if content != "" { | |
| messages = append(messages[:len(messages)-1], llm.NewTextMessage("assistant", content), messages[len(messages)-1]) | |
| } | |
| continue | |
| } | |
| if len(calls) == 0 { | |
| consecutiveEmptyRounds++ | |
| if consecutiveEmptyRounds >= maxConsecutiveEmptyRounds { | |
| fmt.Fprintf(stdout.Writer(), "[ocr] Too many rounds with no tool calls for %s, stopping.\n", newPath) | |
| break | |
| } | |
| fmt.Fprintf(stdout.Writer(), "[ocr] No tool calls parsed for %s, retrying...\n", newPath) | |
| messages = append(messages, llm.NewTextMessage("user", "You did not successfully call any tools. Please try again or use task_done if finished.")) | |
| if content != "" { | |
| messages = append(messages[:len(messages)-1], llm.NewTextMessage("assistant", content), messages[len(messages)-1]) | |
| } | |
| continue | |
| } |
| if err != nil { | ||
| return nil, fmt.Errorf("enumerate files: %w", err) | ||
| } | ||
| a.items = items |
There was a problem hiding this comment.
Potential issue: This mutates a.items with the unfiltered item list. If the Agent is ever reused after calling Preview() (e.g., calling FilesReviewed(), Diffs(), or even Run()), those methods would operate on stale/unfiltered data. For example, FilesReviewed() would return the total enumerated count rather than the reviewable count.
Consider either:
- Not mutating
a.itemsin Preview (use a local variable instead), since Preview is documented as a read-only operation. - Or documenting clearly that Preview must not be called before Run on the same Agent instance.
Suggestion:
| a.items = items | |
| // Use a local variable to avoid mutating agent state in a read-only operation. | |
| // a.items remains nil/unchanged so FilesReviewed()/Diffs() stay consistent. |
| if p.runner != nil { | ||
| out, err = p.runner.Run(ctx, p.repoDir, cmdArgs...) | ||
| } else { | ||
| cmd := exec.CommandContext(ctx, "git", cmdArgs...) | ||
| cmd.Dir = p.repoDir | ||
| raw, runErr := cmd.CombinedOutput() | ||
| out, err = string(raw), runErr | ||
| } |
There was a problem hiding this comment.
Bug: CombinedOutput corrupts NUL-delimited parsing.
When using the -z flag, git ls-files produces NUL-delimited output on stdout. However, CombinedOutput() merges stderr into stdout. If git writes any warnings or diagnostic messages to stderr (e.g., about renamed files, encoding issues, or repository warnings), those bytes will be interleaved with the NUL-delimited file list, producing corrupted filenames.
The gitcmd.Runner already provides an Output() method that captures stdout only. For the fallback path (when p.runner is nil), use cmd.Output() instead of cmd.CombinedOutput(). Consider also using p.runner.Output() for the runner path to keep stdout/stderr separate.
Suggestion:
| if p.runner != nil { | |
| out, err = p.runner.Run(ctx, p.repoDir, cmdArgs...) | |
| } else { | |
| cmd := exec.CommandContext(ctx, "git", cmdArgs...) | |
| cmd.Dir = p.repoDir | |
| raw, runErr := cmd.CombinedOutput() | |
| out, err = string(raw), runErr | |
| } | |
| var raw []byte | |
| if p.runner != nil { | |
| raw, err = p.runner.Output(ctx, p.repoDir, cmdArgs...) | |
| } else { | |
| cmd := exec.CommandContext(ctx, "git", cmdArgs...) | |
| cmd.Dir = p.repoDir | |
| raw, err = cmd.Output() | |
| } | |
| if err != nil { | |
| return nil, err | |
| } | |
| out := string(raw) |
|
|
||
| seen := make(map[string]struct{}, len(tracked)+len(untracked)) | ||
| all := make([]string, 0, len(tracked)+len(untracked)) | ||
| for _, f := range append(tracked, untracked...) { |
There was a problem hiding this comment.
Potential slice mutation bug: append(tracked, untracked...) may corrupt tracked.
If the tracked slice returned by gitLs has excess capacity (which is possible since gitLs builds its result with make([]string, 0, len(raw)) where raw could be larger than the filtered result), then append(tracked, untracked...) will write untracked elements into tracked's underlying array beyond its length. While tracked isn't reused after this point in the current code, this is a fragile pattern that can silently cause bugs during future refactoring.
Use a separate iteration approach instead:
Suggestion:
| for _, f := range append(tracked, untracked...) { | |
| for _, f := range slices.Concat(tracked, untracked) { |
| func (a *Agent) lookupDiff(path string) *model.Diff { | ||
| for i := range a.items { | ||
| if a.items[i].Path == path { | ||
| return a.items[i].AsDiff() | ||
| } | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Performance: lookupDiff performs a linear scan over a.items for every comment that needs line-number resolution. Since this is called from the LLM loop's tool execution path (potentially multiple times per file, across many concurrent files), the O(n) lookup can become a bottleneck for large scans with hundreds of files.
Consider building a map[string]*model.Diff index once during initialization (e.g., in Run() after filtering) and using O(1) map lookups here instead.
Suggestion:
| func (a *Agent) lookupDiff(path string) *model.Diff { | |
| for i := range a.items { | |
| if a.items[i].Path == path { | |
| return a.items[i].AsDiff() | |
| } | |
| } | |
| return nil | |
| } | |
| func (a *Agent) lookupDiff(path string) *model.Diff { | |
| if d, ok := a.diffIndex[path]; ok { | |
| return d | |
| } | |
| return nil | |
| } |
Description
Introduce a new top-level subcommand
ocr scan(aliass) that reviews whole files instead of git diffs. Use cases include reviewing unfamiliar codebases, pre-migration audits, and ad-hoc per-directory reviews.The change also splits scan and diff review at the package level so the two pipelines can evolve independently. Shared LLM tool-use loop, memory compression, and per-call bookkeeping move into a new
internal/llmlooppackage; bothinternal/agent(diff review) and the newinternal/scan(full-file review) delegate tollmloop.Runnerand never import each other.New / changed packages
internal/scan/(new)git ls-files, full-scan agent,FULL_SCAN_TASKrendering, previewinternal/llmloop/(new)CommentWorkerPool,AgentWarninginternal/agent/llmloop; review-only orchestration remainsinternal/model/ScanItem(full-file payload) plusPreview/PreviewEntry/ExcludeReasonshared by both modesinternal/diff/gitignore.goexposing gitignore helpers reused by scancmd/opencodereview/scan_cmd.go;shared.goconsolidates startup (loadCommonContext/loadLLMRuntime), output (emitRunResult,ResultProvider), and stdout silencing (quietHandle);review_cmd.gofollows the same shapeTemplate additions
FULL_SCAN_TASK: dedicated prompt with Tool-call discipline guidance (don't re-read the current file, ≤ 2–3 context calls per finding, batchcode_comment, calltask_doneearly) to reduce gratuitous tool calls.FULL_SCAN_MAX_TOOL_REQUEST_TIMES(default60): scan-only per-file budget, raised over diff's30to fit multi-finding files.--max-toolsstill composes (only raises, never lowers).Other behavioral notes
file_read_diffis filtered out ofMainToolDefs— its semantics don't apply without a diff, and exposing it just burns tool-call rounds.ocr reviewbehavior is unchanged.internal/agentandinternal/scanhave zero mutual imports (go list -depsvalidated).Type of Change
How Has This Been Tested?
make testpasses locallyUnit tests cover:
internal/scan/provider_test.go):.gitignorehonored, binary placeholder emitted, oversized files skipped, exact-file vs. directory-prefix path filtering.FULL_SCAN_TASKplaceholders substituted,change_filesliteral injected, no{{...}}leakage (internal/scan/agent_test.go).injectScanContentMapadapter (internal/scan/agent_test.go).cmd/opencodereview/scan_cmd_test.go):--all/--pathmutual presence,--audience/--max-tools/--max-git-procsvalidation,splitPaths,excludeToolDef(including absent name).internal/config/template/template_test.go):FULL_SCAN_MAX_TOOL_REQUEST_TIMEScorrectly deserialized;ApplyLanguageinjects directives intoFULL_SCAN_TASK.Manual smoke tests:
Static dependency assertions:
Checklist
go fmt,go vet)Related Issues