Skip to content

Support nested GitLab groups in repository name resolution#60

Open
Anthony-Bible wants to merge 3 commits into
masterfrom
claude/refactor-harden-normalizezoektreponame-against-mul-seaoM
Open

Support nested GitLab groups in repository name resolution#60
Anthony-Bible wants to merge 3 commits into
masterfrom
claude/refactor-harden-normalizezoektreponame-against-mul-seaoM

Conversation

@Anthony-Bible
Copy link
Copy Markdown
Owner

Summary

This PR fixes repository name resolution for nested GitLab groups and other hosting services with deep path structures. Previously, the system only handled two-segment repository names (e.g., "owner/repo"), causing failures when resolving chunks for repositories with nested groups (e.g., "org/sub/repo").

Key Changes

  • Added FullPath() method to RepositoryURL: Returns the complete URL path without the host, preserving all path segments. This correctly handles nested GitLab groups and other deep repository structures, unlike the existing FullName() which only returns the last two segments.

  • Replaced stripZoektRepoHost() with zoektRepoNameCandidates(): The new function generates all plausible database repository name variants by progressively stripping leading path segments. For example:

    • "gitlab.com/org/sub/repo"["gitlab.com/org/sub/repo", "org/sub/repo", "sub/repo"]
    • "github.com/owner/repo"["github.com/owner/repo", "owner/repo"]
    • Single-segment candidates are excluded to avoid false-positive matches.
  • Updated job processor: Changed to use FullPath() instead of FullName() when constructing the zoekt repository name, ensuring nested groups are properly indexed.

  • Added comprehensive test coverage:

    • Unit tests for zoektRepoNameCandidates() covering various repository path structures
    • Integration test demonstrating successful chunk resolution via the third candidate for nested GitLab groups
    • Tests for the new FullPath() method

Implementation Details

The resolution strategy now tries repository name candidates in order from most to least specific, stopping when a match is found. This allows the system to correctly resolve chunks even when the database stores repository names with fewer path segments than the zoekt index uses.

https://claude.ai/code/session_018DGnjmwnT5WZPB9MHPYSvH

Fixes two bugs that caused silent chunk-resolution failures for GitLab
nested-group repos (e.g. gitlab.com/org/subgroup/repo):

1. job_processor.go used FullName() for the zoekt repo name, which only
   returns the first two path segments. For nested groups this truncated
   the path (e.g. "gitlab.com/org/sub" instead of "gitlab.com/org/sub/repo").
   Fixed by adding FullPath() to RepositoryURL and using it instead.

2. zoektRepoNameCandidates only generated 2 candidates (original +
   strip-host). For a 4-segment zoekt name the third "sub/repo" variant
   (matching ExtractRepositoryNameFromURL's last-two-segments format) was
   never tried. Replaced the two-candidate logic with a progressive
   segment-stripping algorithm that stops before single-segment results.

Closes #56

https://claude.ai/code/session_018DGnjmwnT5WZPB9MHPYSvH
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves repository name handling for deep repository paths (e.g., nested GitLab groups) by introducing a RepositoryURL.FullPath() helper, updating Zoekt indexing to use the full path, and enhancing hybrid search chunk resolution by trying multiple plausible repository-name candidates.

Changes:

  • Added RepositoryURL.FullPath() plus unit tests.
  • Updated Zoekt repo naming during indexing to use Host()/FullPath() instead of Host()/FullName().
  • Replaced host-stripping logic with zoektRepoNameCandidates() and added tests (including a nested GitLab integration-style resolution test).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/domain/valueobject/repository_url.go Adds FullPath() for preserving all path segments.
internal/domain/valueobject/repository_url_test.go Adds unit tests for FullPath().
internal/application/worker/job_processor.go Uses FullPath() when constructing the Zoekt repository name.
internal/application/service/search_service.go Adds zoektRepoNameCandidates() and uses it for fallback chunk resolution.
internal/application/service/search_service_test.go Adds unit tests for candidates and a nested GitLab resolution test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/domain/valueobject/repository_url.go Outdated
Comment thread internal/application/worker/job_processor.go
GitHub and Bitbucket repos always have exactly owner/repo (2 path
segments). FullPath() now truncates any extra segments (e.g. from
a /tree/<branch> UI URL) to return a clean repo identifier.

GitLab and other hosts are left unchanged so nested group paths
(org/sub/repo) are preserved correctly.

Addresses Copilot review comment on PR #60.

https://claude.ai/code/session_018DGnjmwnT5WZPB9MHPYSvH
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

},
{
name: "GitHub URL with UI tree suffix is truncated to repo root",
rawURL: "https://github.com/owner/repo",
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case named "GitHub URL with UI tree suffix is truncated to repo root" uses the same rawURL as the "standard GitHub repo" case (no /tree/... suffix), so it doesn’t actually verify the truncation behavior. Update the rawURL to include a UI suffix (e.g., /tree/ or /blob//file) so this test would fail if FullPath stops stripping extra segments for GitHub.

Suggested change
rawURL: "https://github.com/owner/repo",
rawURL: "https://github.com/owner/repo/tree/main",

Copilot uses AI. Check for mistakes.
Comment on lines +384 to +402
func (r RepositoryURL) FullPath() string {
parsedURL, _ := url.Parse(r.normalized)
trimmedPath := strings.Trim(parsedURL.Path, "/")
if trimmedPath == "" {
return ""
}

parts := strings.Split(trimmedPath, "/")

// GitHub and Bitbucket repos are always owner/repo (exactly 2 path segments).
// Trim any extra segments that indicate a UI page rather than the repo root.
switch strings.ToLower(parsedURL.Hostname()) {
case "github.com", "www.github.com", "bitbucket.org", "www.bitbucket.org":
if len(parts) > MinPathPartsForRepoURL {
return strings.Join(parts[:MinPathPartsForRepoURL], "/")
}
}

return trimmedPath
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FullPath() relies on the normalized URL path. For GitLab (and any host not in the GitHub/Bitbucket allowlist), it returns the entire path as-is, which will include common UI suffixes like /-/tree/<branch> (and note the normalizer currently rewrites /- to -, turning that into part of the repo path). Since FullPath() is now used to construct the Zoekt repo name, storing a non-repo-root GitLab URL can produce a zoektRepoName that will never match the actual indexed repository. Consider stripping GitLab UI suffixes (e.g., anything after a /-/ segment) and/or applying the same UI-suffix trimming logic to configured extra hosts that follow the owner/repo pattern (e.g., GitHub Enterprise).

Copilot uses AI. Check for mistakes.
Addresses Copilot review comments on PR #60:
- Fix 'GitHub URL with UI tree suffix' test case to actually use a URL
  with a /tree/main suffix (previously used same URL as standard case)
- Add GitLab nested group test with direct /tree/main sentinel
- Add GitLab nested group test with /-/ UI separator (normalised to -/)

https://claude.ai/code/session_018DGnjmwnT5WZPB9MHPYSvH
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants