diff --git a/CHANGELOG.md b/CHANGELOG.md index e04ddb17e..921c4244d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). ### Added - **`socket manifest bazel [beta]`** — Generate Bazel JVM SBOM manifests by running `bazel query` against discovered Maven repos in a Bazel workspace. Closes the inline-Maven-declaration gap that lockfile-only parsing misses for repos like envoy, ray, tensorflow, tink-java, and or-tools. Auto-detects Bzlmod and legacy `WORKSPACE`. - **`socket scan create --auto-manifest`** now covers Bazel workspaces in addition to Gradle/Scala/Kotlin/Conda. Repos with `MODULE.bazel`, `WORKSPACE`, or `WORKSPACE.bazel` are detected automatically and their Maven dependencies extracted as part of the standard scan-create flow. +- **Bazel PyPI extraction** — `socket manifest bazel --ecosystem pypi` now generates `requirements.txt` for Python Bazel workspaces. Discovers custom `rules_python` pip hub names with Bazel command output first, queries `py_library` / `py_binary` / `py_test` dependencies, resolves canonical pinned versions from `requirements_lock.txt`, and emits PEP 503-normalized `name==version` lines. Supports both Bzlmod (`pip.parse`) and legacy `WORKSPACE` (`pip_parse` / `pip_install`) configurations. PyPI remains explicit opt-in for `socket scan create --auto-manifest` until real-world no-lockfile recovery is validated. + +### Changed +- **Bazel diagnostics** — `socket manifest bazel --verbose` now emits bounded subprocess traces with argv, cwd, duration, exit status, output sizes, and failure stderr tails to make customer log-only triage safer and faster. ## [1.1.101](https://github.com/SocketDev/socket-cli/releases/tag/v1.1.101) - 2026-05-22 diff --git a/src/commands/manifest/README.md b/src/commands/manifest/README.md index f9874c9f4..0798df74b 100644 --- a/src/commands/manifest/README.md +++ b/src/commands/manifest/README.md @@ -16,13 +16,14 @@ manifest generator. Useful when you do not want to spell out the language. ## socket manifest bazel [beta] -Generates Bazel JVM SBOM manifests (`maven_install.json`-shaped) by running -`bazel query` against discovered Maven repos in a Bazel workspace. Output is -consumed by `socket scan create` and closes the -inline-Maven-declaration gap that lockfile-only parsing misses. +Generates Bazel SBOM manifests (Maven `maven_install.json` and/or PyPI +`requirements.txt`) by running `bazel query` against discovered ecosystem +hubs in a Bazel workspace. Output is consumed by `socket scan create` and +closes the inline-declaration gap that lockfile-only parsing misses for +Bazel monorepos. -> **Note**: This command generates Maven dependency manifests for Bazel JVM -> workspaces. It does not run reachability analysis. +> **Note**: This command generates dependency manifests for Bazel +> workspaces (Maven and PyPI). It does not run reachability analysis. ### Usage @@ -36,33 +37,98 @@ socket manifest bazel [options] [DIR=.] - `--bazel-rc ` — path to additional `.bazelrc` fragments forwarded to bazel. - `--bazel-flags ` — flags forwarded to every bazel invocation (single quoted string). - `--bazel-output-base ` — Bazel `--output_base` for read-only-cache CI environments. +- `--ecosystem ` — ecosystem(s) to extract; repeatable. Supported values: `maven`, `pypi`. When omitted, Maven is generated by default; PyPI is explicit opt-in. - `--out ` — output directory; default `./.socket/bazel-manifests/`. - `--dry-run`, `--verbose` — standard diagnostic flags. > **Upload**: This subcommand only generates manifests. To generate and > upload in one step, use `socket scan create --auto-manifest .` — it -> detects the workspace, runs the same extraction this subcommand performs, -> and uploads the result. +> detects the workspace, generates Bazel Maven manifests, and uploads the +> result. Generate Bazel PyPI manifests explicitly with `socket manifest bazel +> --ecosystem pypi`, then scan the generated output with `socket scan create`. ### Examples ```bash -# Generate maven manifests from the current Bazel workspace. +# Generate the default Bazel Maven manifest from the current workspace. socket manifest bazel . +# Generate only the PyPI manifest. +socket manifest bazel . --ecosystem pypi + +# Generate both Maven and PyPI manifests explicitly. +socket manifest bazel . --ecosystem maven --ecosystem pypi + # Use bazelisk explicitly. socket manifest bazel --bazel=/usr/local/bin/bazelisk . ``` +### Python/PyPI Extraction + +When `--ecosystem pypi` is selected, the command: + +1. Discovers `rules_python` pip hubs from Bazel's `mod show_extension` output when available, with bounded static parsing of `MODULE.bazel` (`pip.parse(hub_name = "...")`) and legacy `WORKSPACE` (`pip_parse(name = "...")` / `pip_install(name = "...")`) retained as fallback. Hub names are never hardcoded; custom names like `my_pypi` are detected automatically. +2. Validates each candidate hub by probing it with `bazel query` for `:pkg` targets / `alias(` rules. Invalid candidates are dropped. +3. Runs `bazel query 'deps(kind("py_library|py_binary|py_test", //...))'` to determine which PyPI packages are actually reached by Python rules in the repo (test dependencies included for whole-repo scope). +4. Reads `requirements_lock.txt` (the path discovered from `pip.parse(requirements_lock = "...")`) for canonical pinned versions. When the lockfile is unavailable, falls back to parsing `pypi_name=` and `pypi_version=` tags from the spoke `py_library` rules in the hub-and-spoke architecture. +5. Emits a sorted canonical `requirements.txt` containing `name==version` lines for every reached package. + +### PyPI Name and Version Semantics + +- **PEP 503 normalization.** Package matching uses PEP 503 normalization + (lowercase, then any run of `-`, `_`, or `.` is collapsed to a single + `-`). Bazel target names use underscores (`charset_normalizer`); PyPI + canonical names use hyphens (`charset-normalizer`). The emitted + `requirements.txt` always uses the canonical hyphenated form. +- **Lockfile pins win.** When the lockfile and spoke-repo tags disagree on + a version, the lockfile wins because that is the version Bazel actually + resolves at analysis time. A `--verbose` warning is logged for the + divergence. +- **Conflict detection.** When two reached packages normalize to the same + PyPI name with different versions, the command fails clearly: a single + `requirements.txt` cannot represent both versions, and silently + picking one would produce a misleading SBOM. + +### Unsupported PyPI Forms + +The PyPI extractor is intentionally narrow in this phase: + +- **Direct URL, editable (`-e`), and unpinned requirements** are not + emitted. Only canonical `name==version` lines from the resolved + lockfile are produced. Repositories that rely on unpinned or + URL-pinned requirements will see those packages omitted from + `requirements.txt`. +- **Private corpus validation** requires authenticated GitHub access. + When credentials are unavailable, the bazel-bench harness's private + PyPI case skips cleanly with a distinct reason rather than failing. +- **Whole-repo extraction.** The initial PyPI implementation emits one + whole-workspace manifest. Per-target PyPI slicing is not currently + supported. + +### Cross-Language Edges + +Bazel repos with cross-language dependencies (e.g. `rust_library` → +`py_library` via PyO3 / cffi / etc.) are **not** traversed by the PyPI +extractor in this phase. The PyPI extractor only covers Python rule +dependencies reachable from `py_library`, `py_binary`, and `py_test` +targets. Cross-language edges are assigned to Phase 4. The bazel-bench +fixture `constructed/python-pypi` includes Go/Rust sidecars as +validation context only; they are intentionally not asserted by the +PyPI correctness cases. + ### Requirements - `bazel` or `bazelisk` on `PATH` (or pass `--bazel `). -- Network access on cold cache. Bazel and `rules_jvm_external` own their own - retry policy for transient Maven resolution failures — `socket manifest bazel` - does not retry on top of them. +- Network access on cold cache. Bazel and `rules_jvm_external` / + `rules_python` own their own retry policy for transient resolution + failures — `socket manifest bazel` does not retry on top of them. - Writable Bazel output base; pass `--bazel-output-base` for read-only-cache CI. +- For PyPI extraction: a Python 3 interpreter on `PATH` so the + rules_python toolchain can analyze the workspace. -This is the user-visible entry point for Bazel JVM SBOM support; the [beta] label and "Bazel JVM SBOM support" wording must stay consistent across release notes and docs. +This is the user-visible entry point for Bazel SBOM support (Maven and +PyPI); the [beta] label and "Bazel SBOM support" wording must stay +consistent across release notes and docs. ## socket manifest cdxgen diff --git a/src/commands/manifest/bazel/bazel-pypi-discovery.mts b/src/commands/manifest/bazel/bazel-pypi-discovery.mts new file mode 100644 index 000000000..e92561cef --- /dev/null +++ b/src/commands/manifest/bazel/bazel-pypi-discovery.mts @@ -0,0 +1,443 @@ +import { existsSync, readFileSync, readdirSync, statSync } from 'node:fs' +import path from 'node:path' + +import { logger } from '@socketsecurity/registry/lib/logger' + +import { getErrorCause } from '../../../utils/errors.mts' + +import type { RepoProbe, ValidationResult } from './bazel-repo-discovery.mts' + +// Maximum size (bytes) we will read for any single Bazel workspace file. +// Prevents DoS via maliciously large MODULE.bazel / WORKSPACE / .bzl files. +const MAX_WORKSPACE_FILE_BYTES = 5 * 1024 * 1024 + +// Maximum candidate count we will return (deduped) before failing. +// Real repos have <20; this is a hard ceiling against pathological inputs. +const MAX_CANDIDATES = 256 + +// Regex strategy: anchored, bounded character classes, no nested quantifiers. + +// Bzlmod: discover `use_extension(..., "pip")` bindings, then match +// `${binding}.parse(...)` to find pip hub declarations. +// Bounded: matches up to ~256 chars of path to avoid catastrophic backtracking. +const USE_EXTENSION_PIP_RE = + /(\w+)\s*=\s*use_extension\s*\(\s*["'][^"']{0,256}pip\.bzl["']\s*,\s*["']pip["']\s*\)/g + +// Extract hub_name, requirements_lock, and python_version from a pip.parse +// argument blob. Bounded character classes and length caps. +const HUB_NAME_ATTR_RE = /hub_name\s*=\s*(["'])([A-Za-z0-9_]{1,129})\1/ +const REQUIREMENTS_LOCK_ATTR_RE = + /requirements_lock\s*=\s*(["'])([^"']{1,512})\1/ +const PYTHON_VERSION_ATTR_RE = /python_version\s*=\s*(["'])([0-9._+!]{1,32})\1/ + +// Legacy WORKSPACE patterns: pip_parse, pip_install, pip_repository. +// Bounded: matches up to ~8KB of argument list. +const PIP_PARSE_NAME_RE = /pip_parse\s*\(\s*([^)]{0,8192})\)/g +const PIP_INSTALL_NAME_RE = /pip_install\s*\(\s*([^)]{0,8192})\)/g +const PIP_REPOSITORY_NAME_RE = /pip_repository\s*\(\s*([^)]{0,8192})\)/g +const NAME_ATTR_RE = /name\s*=\s*(["'])([A-Za-z0-9_]{1,129})\1/ +const LEGACY_REQ_LOCK_RE = /requirements_lock\s*=\s*(["'])([^"']{1,512})\1/ +const MOD_SHOW_PIP_PARSE_RE = /pip\.parse\s*\(\s*([^)]{0,8192})\)/g +const MOD_SHOW_USE_REPO_RE = + /use_repo\s*\(\s*\w+\s*,\s*(["'])([A-Za-z0-9_]{1,129})\1\s*\)/g + +// Hub validation: accept alias rules or `:pkg` targets in probe stdout. +// Does NOT require `pypi_name=` (that marker lives on spoke repos). +const PYPI_HUB_MARKER_RE = /:pkg\b|alias\s*\(/ + +export type PypiHubInfo = { + hubName: string + source: + | 'MODULE.bazel' + | 'WORKSPACE' + | 'WORKSPACE.bazel' + | '.bzl' + | 'visible-repos' + | 'default-seed' + | 'bazel-mod-show-extension' + workspaceMode: 'bzlmod' | 'legacy' | 'unknown' + pythonVersion?: string | undefined + requirementsLockLabel?: string | undefined + requirementsLockPath?: string | undefined + probeStdout: string + visibleRepoNames?: string[] | undefined +} + +export type PypiHubCandidate = Omit< + PypiHubInfo, + 'probeStdout' | 'visibleRepoNames' +> + +export function parseBazelModPipExtensionCandidates( + stdout: string, + verbose?: boolean, +): PypiHubCandidate[] { + const useRepoNames = new Set() + for (const m of stdout.matchAll(MOD_SHOW_USE_REPO_RE)) { + useRepoNames.add(m[2] as string) + } + + const candidates: PypiHubCandidate[] = [] + for (const m of stdout.matchAll(MOD_SHOW_PIP_PARSE_RE)) { + const info = extractHubInfoFromArgBlob( + m[1] ?? '', + 'bazel-mod-show-extension', + 'bzlmod', + ) + if (!info) { + continue + } + if (useRepoNames.size && !useRepoNames.has(info.hubName)) { + if (verbose) { + logger.log( + `[VERBOSE] discovery: dropping pip.parse hub '${info.hubName}' because show_extension did not report matching use_repo.`, + ) + } + continue + } + candidates.push(info) + } + + if (verbose) { + logger.log( + '[VERBOSE] discovery: bazel mod show_extension pip.parse hits:', + candidates.length, + 'use_repo:', + Array.from(useRepoNames), + ) + } + return dedupCapped(candidates, verbose) +} + +// Reads file contents, refusing files that exceed MAX_WORKSPACE_FILE_BYTES. +// Returns null when the file is missing, oversized, or unreadable. +function safeReadFile(file: string): string | null { + if (!existsSync(file)) { + return null + } + try { + const stat = statSync(file) + if (stat.size > MAX_WORKSPACE_FILE_BYTES) { + return null + } + return readFileSync(file, 'utf8') + } catch { + return null + } +} + +// Walks workspace root for legacy Starlark sources we can scan: WORKSPACE +// (and WORKSPACE.bazel) plus top-level .bzl files. Non-recursive by design; +// Phase 1 explicitly avoids static Starlark parsing at depth. +function listLegacyStarlarkFiles(cwd: string): string[] { + const files: string[] = [] + const candidates = ['WORKSPACE', 'WORKSPACE.bazel'] + for (const c of candidates) { + const p = path.join(cwd, c) + if (existsSync(p)) { + files.push(p) + } + } + // Top-level .bzl files only. + try { + for (const entry of readdirSync(cwd)) { + if (entry.endsWith('.bzl')) { + files.push(path.join(cwd, entry)) + } + } + } catch { + // Ignore unreadable cwd. + } + return files +} + +// Returns deduplicated list of items, capped at MAX_CANDIDATES. +// Precedence: the first occurrence of a given hubName wins. Callers +// must order inputs so the preferred source comes first (e.g., Bzlmod +// hits before legacy WORKSPACE hits during migration). +// Throws a clear error if the cap is exceeded so callers do not silently +// truncate. Emits a verbose warning when a later entry is dropped due to +// a name collision so users can see implicit precedence at work. +function dedupCapped( + items: PypiHubCandidate[], + verbose?: boolean, +): PypiHubCandidate[] { + const seen = new Map() + const out: PypiHubCandidate[] = [] + for (const item of items) { + const existing = seen.get(item.hubName) + if (!existing) { + seen.set(item.hubName, item) + out.push(item) + if (out.length >= MAX_CANDIDATES) { + throw new Error( + `Discovered more than ${MAX_CANDIDATES} pip hub candidates. ` + + 'This exceeds the safety ceiling; aborting discovery.', + ) + } + } else if (verbose) { + logger.log( + `[VERBOSE] discovery: dropping duplicate pip hub candidate '${item.hubName}' ` + + `(kept first occurrence from ${existing.source}/${existing.workspaceMode}, ` + + `dropped ${item.source}/${item.workspaceMode}).`, + ) + } + } + return out +} + +// Build a dynamic regex for `${binding}.parse(...)` given a validated binding +// name (word characters only, so safe to embed). Bounded arg list. +function buildPipParseRe(binding: string): RegExp { + return new RegExp(`${binding}\\.parse\\s*\\(\\s*([^)]{0,8192})\\)`, 'g') +} + +// Extract candidate hub fields from a pip.parse / pip_parse / pip_install / +// pip_repository argument blob (without probeStdout or visibleRepoNames). +function extractHubInfoFromArgBlob( + argBlob: string, + source: PypiHubInfo['source'], + workspaceMode: PypiHubInfo['workspaceMode'], +): PypiHubCandidate | undefined { + const hubMatch = HUB_NAME_ATTR_RE.exec(argBlob) + const nameMatch = NAME_ATTR_RE.exec(argBlob) + const hubName = hubMatch?.[2] ?? nameMatch?.[2] + if (!hubName) { + return undefined + } + const lockMatch = + REQUIREMENTS_LOCK_ATTR_RE.exec(argBlob) ?? LEGACY_REQ_LOCK_RE.exec(argBlob) + const pythonVersion = PYTHON_VERSION_ATTR_RE.exec(argBlob)?.[2] + return { + hubName, + source, + workspaceMode, + pythonVersion, + requirementsLockLabel: lockMatch?.[2], + } +} + +// Step 1: parse candidate pip hub names from Bzlmod MODULE.bazel and legacy +// WORKSPACE / .bzl entry points. +// +// Precedence: Bzlmod (MODULE.bazel pip.parse) hits are pushed first, then +// legacy (pip_parse / pip_install / pip_repository) hits. dedupCapped keeps +// the first occurrence, so during migration scenarios where both +// MODULE.bazel and WORKSPACE define a hub with the same name, the Bzlmod +// entry wins implicitly. Pass verbose=true to surface dropped duplicates. +export function parsePypiHubCandidates( + cwd: string, + verbose?: boolean, +): PypiHubCandidate[] { + const candidates: PypiHubCandidate[] = [] + + // Bzlmod path: parse MODULE.bazel for use_extension bindings to pip, + // then match ${binding}.parse(...). + const moduleBazel = path.join(cwd, 'MODULE.bazel') + const moduleContent = safeReadFile(moduleBazel) + if (moduleContent) { + const bindings: string[] = [] + for (const m of moduleContent.matchAll(USE_EXTENSION_PIP_RE)) { + bindings.push(m[1] as string) + } + if (verbose) { + logger.log( + '[VERBOSE] discovery: scanned', + moduleBazel, + `(${bindings.length} use_extension pip binding(s))`, + ) + } + + for (const binding of bindings) { + const parseRe = buildPipParseRe(binding) + for (const m of moduleContent.matchAll(parseRe)) { + const argBlob = m[1] ?? '' + const info = extractHubInfoFromArgBlob( + argBlob, + 'MODULE.bazel', + 'bzlmod', + ) + if (info) { + candidates.push(info) + } + } + } + + if (verbose) { + logger.log( + '[VERBOSE] discovery: MODULE.bazel pip.parse hits:', + candidates.length, + ) + } + } else if (verbose) { + logger.log( + '[VERBOSE] discovery:', + moduleBazel, + 'not present (skipping bzlmod scan)', + ) + } + + // Legacy path: scan WORKSPACE + top-level .bzl files for pip_parse, + // pip_install, and pip_repository. + const legacyFiles = listLegacyStarlarkFiles(cwd) + if (verbose) { + logger.log( + '[VERBOSE] discovery: legacy files considered:', + legacyFiles.length ? legacyFiles : '(none)', + ) + } + for (const file of legacyFiles) { + const content = safeReadFile(file) + if (!content) { + continue + } + const fileHits: PypiHubCandidate[] = [] + const source: PypiHubInfo['source'] = file.endsWith('.bzl') + ? '.bzl' + : path.basename(file) === 'WORKSPACE.bazel' + ? 'WORKSPACE.bazel' + : 'WORKSPACE' + + for (const m of content.matchAll(PIP_PARSE_NAME_RE)) { + const info = extractHubInfoFromArgBlob(m[1] ?? '', source, 'legacy') + if (info) { + fileHits.push(info) + } + } + for (const m of content.matchAll(PIP_INSTALL_NAME_RE)) { + const info = extractHubInfoFromArgBlob(m[1] ?? '', source, 'legacy') + if (info) { + fileHits.push(info) + } + } + for (const m of content.matchAll(PIP_REPOSITORY_NAME_RE)) { + const info = extractHubInfoFromArgBlob(m[1] ?? '', source, 'legacy') + if (info) { + fileHits.push(info) + } + } + + candidates.push(...fileHits) + if (verbose) { + logger.log( + '[VERBOSE] discovery: scanned', + file, + `(${fileHits.length} legacy pip hub match(es))`, + ) + } + } + + return dedupCapped(candidates, verbose) +} + +// Step 2: validate a candidate by running the probe and confirming +// `:pkg` labels or alias rules appear in stdout. Does NOT require +// `pypi_name=` (that marker lives on spoke repos). +export async function validatePypiHub( + hubName: string, + probe: RepoProbe, + verbose?: boolean, +): Promise { + try { + const result = await probe(hubName) + if (result.code !== 0) { + if (verbose) { + logger.log( + `[VERBOSE] discovery: probe @${hubName}: REJECT (code=${result.code})`, + ) + } + return { valid: false, stdout: result.stdout } + } + const valid = PYPI_HUB_MARKER_RE.test(result.stdout) + if (verbose) { + logger.log( + `[VERBOSE] discovery: probe @${hubName}:`, + valid + ? 'ACCEPT (hub alias/pkg marker found)' + : 'REJECT (no hub alias/pkg marker in probe stdout)', + ) + } + return { valid, stdout: result.stdout } + } catch (e) { + if (verbose) { + logger.log( + `[VERBOSE] discovery: probe @${hubName}: REJECT (probe threw):`, + getErrorCause(e), + ) + } + return { valid: false, stdout: '' } + } +} + +// The default pip hub name when no explicit hub_name/name is given. +// Included as a seed so repos whose pip.parse is in a sub-module (not +// found by static scanning) can still be discovered via probe validation. +const DEFAULT_PYPI_HUB_SEED = 'pypi' + +// Composition: parse, then validate each candidate; return validated subset +// as a Map keyed by hub name with the validated PypiHubInfo. +// Always seeds with the default 'pypi' hub name first. +export async function discoverPypiHubs( + cwd: string, + probe: RepoProbe, + nativeCandidates?: string[], + verbose?: boolean, + bazelCommandCandidates?: PypiHubCandidate[], +): Promise> { + // Always run the static parse so MODULE.bazel pip.parse metadata + // (requirements_lock, python_version) is available for downstream + // lockfile resolution. Native repo-mapping candidates are intentionally + // corroborating data only: many non-PyPI repositories expose alias or :pkg + // targets, so bare visible repos are too broad to probe as PyPI hubs. + const parsedAll = bazelCommandCandidates?.length + ? dedupCapped(bazelCommandCandidates, verbose) + : parsePypiHubCandidates(cwd, verbose) + const parsed: PypiHubCandidate[] = parsedAll + if (verbose) { + logger.log( + '[VERBOSE] discovery: candidate source:', + bazelCommandCandidates?.length + ? `bazel mod show_extension (${parsed.length})` + : nativeCandidates && nativeCandidates.length + ? `static parse (${parsed.length}) with bzlmod visible-repos (${nativeCandidates.length}) as corroboration` + : `static parse (${parsed.length})`, + ) + } + // Prepend the default hub seed unless parsed metadata already covers it. + const candidates: PypiHubCandidate[] = parsed.some( + c => c.hubName === DEFAULT_PYPI_HUB_SEED, + ) + ? parsed + : [ + { + hubName: DEFAULT_PYPI_HUB_SEED, + source: 'default-seed', + workspaceMode: 'unknown', + }, + ...parsed, + ] + if (verbose) { + logger.log( + '[VERBOSE] discovery: candidate set to probe (seed-first, deduped):', + candidates.map(c => c.hubName), + ) + } + const validated = new Map() + for (const c of candidates) { + // eslint-disable-next-line no-await-in-loop + const result = await validatePypiHub(c.hubName, probe, verbose) + if (result.valid) { + validated.set(c.hubName, { + ...c, + probeStdout: result.stdout, + }) + } + } + if (verbose) { + logger.log( + '[VERBOSE] discovery: validated pip hubs:', + Array.from(validated.keys()), + ) + } + return validated +} diff --git a/src/commands/manifest/bazel/bazel-pypi-discovery.test.mts b/src/commands/manifest/bazel/bazel-pypi-discovery.test.mts new file mode 100644 index 000000000..d4a9a8437 --- /dev/null +++ b/src/commands/manifest/bazel/bazel-pypi-discovery.test.mts @@ -0,0 +1,623 @@ +import { mkdtempSync, rmSync, writeFileSync } from 'node:fs' +import os from 'node:os' +import path from 'node:path' +import { fileURLToPath } from 'node:url' + +import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest' + +import { logger } from '@socketsecurity/registry/lib/logger' + +import { + discoverPypiHubs, + parseBazelModPipExtensionCandidates, + parsePypiHubCandidates, + validatePypiHub, +} from './bazel-pypi-discovery.mts' + +import type { RepoProbe } from './bazel-repo-discovery.mts' + +const __filename = fileURLToPath(import.meta.url) +const __dirname = path.dirname(__filename) + +const FIXTURES = path.join( + __dirname, + '..', + '..', + '..', + '..', + 'test', + 'fixtures', + 'manifest-bazel', +) + +const acceptingPypiProbe: RepoProbe = async () => ({ + stdout: + 'alias(\n name = "pkg",\n actual = select(...),\n)\n@pypi//requests:pkg\n', + code: 0, +}) + +const rejectingPypiProbe: RepoProbe = async () => ({ stdout: '', code: 0 }) + +const failingPypiProbe: RepoProbe = async () => ({ stdout: '', code: 1 }) + +const throwingPypiProbe: RepoProbe = async () => { + throw new Error('bazel exploded') +} + +const selectivePypiProbe: RepoProbe = async name => + name === 'pypi' + ? { stdout: '@pypi//requests:pkg\n', code: 0 } + : { stdout: '', code: 0 } + +const aliasOnlyProbe: RepoProbe = async () => ({ + stdout: 'alias(\n name = "pkg",\n actual = "//foo:bar",\n)\n', + code: 0, +}) + +const noPypiNameProbe: RepoProbe = async () => ({ + stdout: 'alias(\n name = "pkg",\n)\n', + code: 0, +}) + +describe('bazel-pypi-discovery', () => { + describe('parsePypiHubCandidates', () => { + it('parses pip metadata from bazel mod show_extension output', () => { + const result = parseBazelModPipExtensionCandidates( + 'pip.parse(hub_name="pypi", python_version="3.12", requirements_lock="//:requirements_lock.txt")\n' + + 'use_repo(pip, "pypi")\n', + ) + expect(result).toEqual([ + { + hubName: 'pypi', + pythonVersion: '3.12', + requirementsLockLabel: '//:requirements_lock.txt', + source: 'bazel-mod-show-extension', + workspaceMode: 'bzlmod', + }, + ]) + }) + + it('filters show_extension pip.parse entries not exported by use_repo', () => { + const result = parseBazelModPipExtensionCandidates( + 'pip.parse(hub_name="hidden", requirements_lock="//:req.txt")\n' + + 'use_repo(pip, "pypi")\n', + ) + expect(result).toEqual([]) + }) + + it('parses single pip.parse from bzlmod-only', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + 'pip.parse(\n' + + ' hub_name = "pypi",\n' + + ' python_version = "3.12",\n' + + ' requirements_lock = "//:requirements_lock.txt",\n' + + ')\n' + + 'use_repo(pip, "pypi")\n', + ) + const result = parsePypiHubCandidates(dir) + expect(result).toHaveLength(1) + expect(result[0]).toEqual({ + hubName: 'pypi', + source: 'MODULE.bazel', + workspaceMode: 'bzlmod', + pythonVersion: '3.12', + requirementsLockLabel: '//:requirements_lock.txt', + }) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('parses renamed use_extension binding', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'my_pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + 'my_pip.parse(\n' + + ' hub_name = "custom_pypi",\n' + + ' requirements_lock = "//:requirements_lock.txt",\n' + + ')\n' + + 'use_repo(my_pip, "custom_pypi")\n', + ) + const result = parsePypiHubCandidates(dir) + expect(result).toHaveLength(1) + expect(result[0]).toEqual({ + hubName: 'custom_pypi', + source: 'MODULE.bazel', + workspaceMode: 'bzlmod', + requirementsLockLabel: '//:requirements_lock.txt', + }) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('parses single-quoted bzlmod pip.parse attributes', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + 'pip.parse(\n' + + " hub_name = 'custom_pypi',\n" + + " python_version = '3.12',\n" + + " requirements_lock = '//:requirements_lock.txt',\n" + + ')\n', + ) + const result = parsePypiHubCandidates(dir) + expect(result).toHaveLength(1) + expect(result[0]).toEqual({ + hubName: 'custom_pypi', + source: 'MODULE.bazel', + workspaceMode: 'bzlmod', + pythonVersion: '3.12', + requirementsLockLabel: '//:requirements_lock.txt', + }) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('parses pip_parse name from legacy WORKSPACE', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'WORKSPACE'), + 'pip_parse(\n' + + ' name = "pypi",\n' + + ' requirements_lock = "//:requirements_lock.txt",\n' + + ')\n', + ) + const result = parsePypiHubCandidates(dir) + expect(result).toHaveLength(1) + expect(result[0]).toEqual({ + hubName: 'pypi', + source: 'WORKSPACE', + workspaceMode: 'legacy', + requirementsLockLabel: '//:requirements_lock.txt', + }) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('parses single-quoted legacy pip_parse and lockfile attributes', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'WORKSPACE'), + 'pip_parse(\n' + + " name = 'pypi',\n" + + " requirements_lock = '//:requirements_lock.txt',\n" + + ')\n', + ) + const result = parsePypiHubCandidates(dir) + expect(result).toHaveLength(1) + expect(result[0]).toEqual({ + hubName: 'pypi', + source: 'WORKSPACE', + workspaceMode: 'legacy', + requirementsLockLabel: '//:requirements_lock.txt', + }) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('parses pip_install name from legacy WORKSPACE', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'WORKSPACE'), + 'pip_install(\n' + + ' name = "pypi",\n' + + ' requirements = ["//:requirements.txt"],\n' + + ')\n', + ) + const result = parsePypiHubCandidates(dir) + expect(result).toHaveLength(1) + expect(result[0]).toEqual({ + hubName: 'pypi', + source: 'WORKSPACE', + workspaceMode: 'legacy', + }) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('parses single-quoted pip_install name from legacy WORKSPACE', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'WORKSPACE'), + "pip_install(name = 'pypi', requirements = ['//:requirements.txt'])\n", + ) + const result = parsePypiHubCandidates(dir) + expect(result).toHaveLength(1) + expect(result[0]).toEqual({ + hubName: 'pypi', + source: 'WORKSPACE', + workspaceMode: 'legacy', + }) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('parses pip_repository name from legacy WORKSPACE', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'WORKSPACE'), + 'pip_repository(\n' + + ' name = "pypi",\n' + + ' requirements = ["//:requirements.txt"],\n' + + ')\n', + ) + const result = parsePypiHubCandidates(dir) + expect(result).toHaveLength(1) + expect(result[0]).toEqual({ + hubName: 'pypi', + source: 'WORKSPACE', + workspaceMode: 'legacy', + }) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('parses multiple hubs from a single MODULE.bazel', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + 'pip.parse(hub_name = "pypi", python_version = "3.11", requirements_lock = "//:req1.txt")\n' + + 'pip.parse(hub_name = "pip_test", python_version = "3.12", requirements_lock = "//:req2.txt")\n', + ) + const result = parsePypiHubCandidates(dir) + expect(result).toHaveLength(2) + const names = result.map(r => r.hubName).sort() + expect(names).toEqual(['pip_test', 'pypi']) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('handles multiple python_version values', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + 'pip.parse(hub_name = "pypi", python_version = "3.11", requirements_lock = "//:req.txt")\n' + + 'pip.parse(hub_name = "pypi_312", python_version = "3.12", requirements_lock = "//:req2.txt")\n', + ) + const result = parsePypiHubCandidates(dir) + expect(result).toHaveLength(2) + const pypi = result.find(r => r.hubName === 'pypi') + expect(pypi?.pythonVersion).toBe('3.11') + const pypi312 = result.find(r => r.hubName === 'pypi_312') + expect(pypi312?.pythonVersion).toBe('3.12') + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('returns empty array on a directory without bazel markers', () => { + expect(parsePypiHubCandidates(FIXTURES)).toEqual([]) + }) + + it('ignores malformed pip.parse blocks without hub_name', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + 'pip.parse(requirements_lock = "//:req.txt")\n', + ) + const result = parsePypiHubCandidates(dir) + expect(result).toEqual([]) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + }) + + describe('validatePypiHub', () => { + it('accepts when probe stdout contains :pkg label', async () => { + const r = await validatePypiHub('pypi', acceptingPypiProbe) + expect(r.valid).toBe(true) + expect(r.stdout).toContain(':pkg') + }) + + it('accepts when probe stdout contains alias rule', async () => { + const r = await validatePypiHub('pypi', aliasOnlyProbe) + expect(r.valid).toBe(true) + }) + + it('rejects when probe stdout lacks :pkg or alias', async () => { + expect( + (await validatePypiHub('empty_hub', rejectingPypiProbe)).valid, + ).toBe(false) + }) + + it('rejects on non-zero exit code', async () => { + expect((await validatePypiHub('crash', failingPypiProbe)).valid).toBe( + false, + ) + }) + + it('rejects when probe throws', async () => { + expect((await validatePypiHub('boom', throwingPypiProbe)).valid).toBe( + false, + ) + }) + + it('does not require pypi_name= in hub stdout', async () => { + const r = await validatePypiHub('pypi', noPypiNameProbe) + expect(r.valid).toBe(true) + }) + }) + + describe('discoverPypiHubs', () => { + it('returns parsed candidates that the probe validates', async () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + 'pip.parse(hub_name = "pypi", requirements_lock = "//:req.txt")\n' + + 'pip.parse(hub_name = "pip_test", requirements_lock = "//:req2.txt")\n', + ) + const result = await discoverPypiHubs(dir, acceptingPypiProbe) + expect(Array.from(result.keys()).sort()).toEqual(['pip_test', 'pypi']) + for (const info of result.values()) { + expect(info.probeStdout).toContain(':pkg') + } + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('does not treat bare visible repo candidates as PyPI hubs', async () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + 'pip.parse(hub_name = "pypi", requirements_lock = "//:req.txt")\n', + ) + const result = await discoverPypiHubs(dir, acceptingPypiProbe, [ + 'native_pypi', + ]) + expect(Array.from(result.keys())).toEqual(['pypi']) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('filters out candidates the probe rejects', async () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + 'pip.parse(hub_name = "pypi", requirements_lock = "//:req.txt")\n' + + 'pip.parse(hub_name = "rejected", requirements_lock = "//:req2.txt")\n', + ) + const result = await discoverPypiHubs(dir, selectivePypiProbe) + expect(Array.from(result.keys())).toEqual(['pypi']) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('always seeds with default pypi hub', async () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + // No MODULE.bazel or WORKSPACE — only the default seed can match. + const result = await discoverPypiHubs(dir, selectivePypiProbe) + expect(Array.from(result.keys())).toEqual(['pypi']) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('prefers bazel command candidates over static MODULE parsing', async () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + 'pip.parse(hub_name = "static_pypi", requirements_lock = "//:req.txt")\n', + ) + const result = await discoverPypiHubs( + dir, + acceptingPypiProbe, + undefined, + undefined, + [ + { + hubName: 'pypi', + requirementsLockLabel: '//:requirements_lock.txt', + source: 'bazel-mod-show-extension', + workspaceMode: 'bzlmod', + }, + ], + ) + expect(Array.from(result.keys())).toEqual(['pypi']) + expect(result.get('pypi')?.source).toBe('bazel-mod-show-extension') + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + }) + + describe('verbose diagnostics', () => { + let logSpy: ReturnType + + beforeEach(() => { + logSpy = vi.spyOn(logger, 'log').mockImplementation(() => logger) + }) + + afterEach(() => { + logSpy.mockRestore() + }) + + function loggedLines(): string { + return logSpy.mock.calls + .map(args => args.map(a => String(a)).join(' ')) + .join('\n') + } + + it('parsePypiHubCandidates stays silent when verbose is unset', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + 'pip.parse(hub_name = "pypi", requirements_lock = "//:req.txt")\n', + ) + parsePypiHubCandidates(dir) + expect(logSpy).not.toHaveBeenCalled() + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('parsePypiHubCandidates emits scanned-files + candidate set when verbose=true', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + 'pip.parse(hub_name = "pypi", requirements_lock = "//:req.txt")\n', + ) + parsePypiHubCandidates(dir, true) + const text = loggedLines() + expect(text).toContain('discovery: scanned') + expect(text).toContain('MODULE.bazel') + expect(text).toContain('use_extension pip binding') + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('validatePypiHub logs ACCEPT under verbose', async () => { + await validatePypiHub('pypi', acceptingPypiProbe, true) + expect(loggedLines()).toMatch( + /probe @pypi:\s*ACCEPT \(hub alias\/pkg marker found\)/, + ) + }) + + it('validatePypiHub logs REJECT (no marker) under verbose', async () => { + await validatePypiHub('not_pypi', rejectingPypiProbe, true) + expect(loggedLines()).toMatch(/probe @not_pypi:\s*REJECT/) + }) + + it('validatePypiHub logs REJECT (probe threw) under verbose', async () => { + await validatePypiHub('boom', throwingPypiProbe, true) + expect(loggedLines()).toMatch(/probe @boom:\s*REJECT \(probe threw\)/) + }) + + it('discoverPypiHubs propagates verbose into the full pipeline', async () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + 'pip.parse(hub_name = "pypi", requirements_lock = "//:req.txt")\n' + + 'pip.parse(hub_name = "rejected", requirements_lock = "//:req2.txt")\n', + ) + await discoverPypiHubs(dir, selectivePypiProbe, undefined, true) + const text = loggedLines() + expect(text).toContain('candidate source: static parse') + expect(text).toContain('candidate set to probe') + expect(text).toMatch(/probe @pypi:\s*ACCEPT/) + expect(text).toMatch(/probe @rejected:\s*REJECT/) + expect(text).toContain('validated pip hubs') + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + }) + + describe('DoS guard', () => { + it('completes parse on 1MB pathological input within 1s', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + const lines: string[] = [] + let totalLen = 0 + while (totalLen < 1_000_000) { + const line = + 'pip.parse(hub_name = "x_' + + lines.length + + '", requirements_lock = "//:req.txt")' + lines.push(line) + totalLen += line.length + 1 + } + writeFileSync( + path.join(dir, 'MODULE.bazel'), + 'pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")\n' + + lines.join('\n') + + '\n', + ) + const start = process.hrtime.bigint() + expect(() => parsePypiHubCandidates(dir)).toThrow( + /more than 256 pip hub candidates/, + ) + const elapsed = process.hrtime.bigint() - start + expect(elapsed).toBeLessThan(1_000_000_000n) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('ignores oversized MODULE.bazel files', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + // Write a file larger than MAX_WORKSPACE_FILE_BYTES (5MB). + const bigContent = 'x'.repeat(6 * 1024 * 1024) + writeFileSync(path.join(dir, 'MODULE.bazel'), bigContent) + const result = parsePypiHubCandidates(dir) + expect(result).toEqual([]) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('ignores oversized WORKSPACE files', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + const bigContent = 'x'.repeat(6 * 1024 * 1024) + writeFileSync(path.join(dir, 'WORKSPACE'), bigContent) + const result = parsePypiHubCandidates(dir) + expect(result).toEqual([]) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + + it('ignores oversized top-level .bzl files', () => { + const dir = mkdtempSync(path.join(os.tmpdir(), 'bazel-pypi-')) + try { + // Write a 6MB .bzl file (exceeds MAX_WORKSPACE_FILE_BYTES = 5MB). + // The oversized file should be silently dropped by safeReadFile, + // not parsed for legacy pip_parse/pip_install/pip_repository hits. + const bigContent = 'x'.repeat(6 * 1024 * 1024) + writeFileSync(path.join(dir, 'pip_repo.bzl'), bigContent) + const result = parsePypiHubCandidates(dir) + expect(result).toEqual([]) + } finally { + rmSync(dir, { recursive: true, force: true }) + } + }) + }) +}) diff --git a/src/commands/manifest/bazel/bazel-pypi-parser.mts b/src/commands/manifest/bazel/bazel-pypi-parser.mts new file mode 100644 index 000000000..769679674 --- /dev/null +++ b/src/commands/manifest/bazel/bazel-pypi-parser.mts @@ -0,0 +1,365 @@ +/** + * Parse Bazel PyPI extraction inputs into the pinned `name==version` lines + * needed for generated `requirements.txt` output. + * + * This is deliberately not a general-purpose requirements.txt parser. It only + * accepts pinned lockfile-style entries needed to map reached Bazel labels to + * exact package versions; depscan remains the owner of full PEP 508 + * requirements ingestion during scan processing. + * + * Security gate: every regex uses bounded character classes to prevent + * catastrophic backtracking on hostile input. + */ + +import { existsSync, readFileSync, statSync } from 'node:fs' +import path from 'node:path' + +// Maximum size (bytes) we will read for any requirements lockfile. +// Prevents DoS via maliciously large lockfiles. +const MAX_REQUIREMENTS_FILE_BYTES = 5 * 1024 * 1024 + +export type ExtractedPypiPackage = { + name: string + version: string + bazelName: string + source?: 'lockfile' | 'spoke-tag' | undefined + originalLine?: string | undefined +} + +export type ReachedPypiLabel = { + hubName: string + originalLabel: string + bazelName: string + normalizedName: string + apparentLabel: string + spokeLabel?: string | undefined +} + +// Normalize a PyPI package name per PEP 503: +// lowercase, then collapse `.`, `_`, and `-` runs to a single `-`. +export function normalizePypiName(name: string): string { + return name + .toLowerCase() + .replace(/[._-]+/g, '-') + .replace(/^-+/, '') + .replace(/-+$/, '') +} + +// Convert a Bazel underscore_name to a PyPI hyphenated-name. +export function bazelNameToPypiName(bazelName: string): string { + return bazelName.replace(/_/g, '-') +} + +// Validate that a resolved path stays within the workspace root. +function isWithinWorkspace(resolved: string, cwd: string): boolean { + const rel = path.relative(cwd, resolved) + return !rel.startsWith('..') && !path.isAbsolute(rel) +} + +// Resolves a Bazel label or workspace-relative path to a filesystem path. +// Returns undefined for labels that cannot be resolved locally. +export function resolveRequirementsLockPath( + label: string | undefined, + cwd: string, +): string | undefined { + if (!label) { + return undefined + } + // Reject labels with path-traversal segments. + if (label.includes('..')) { + return undefined + } + // Reject external repository labels. + if (label.startsWith('@')) { + return undefined + } + // Bazel local label forms: + // //:requirements_lock.txt + // //subdir:requirements_lock.txt + // :requirements_lock.txt + let filePart: string + if (label.startsWith('//')) { + const colon = label.indexOf(':') + if (colon < 0) { + return undefined + } + const pkgPath = label.slice(2, colon) + const filePart = label.slice(colon + 1) + if (!filePart) { + return undefined + } + const resolved = path.join(cwd, pkgPath, filePart) + if (!isWithinWorkspace(resolved, cwd)) { + return undefined + } + return resolved + } + if (label.startsWith(':')) { + filePart = label.slice(1) + if (!filePart) { + return undefined + } + const resolved = path.join(cwd, filePart) + if (!isWithinWorkspace(resolved, cwd)) { + return undefined + } + return resolved + } + // Reject absolute paths (only for non-label inputs). + if (path.isAbsolute(label)) { + return undefined + } + // Bare workspace-relative path (no leading // or :). + const resolved = path.join(cwd, label) + if (!isWithinWorkspace(resolved, cwd)) { + return undefined + } + return resolved +} + +// Parses a single pinned `name==version` lockfile line. +// Group 1 = package name, Group 2 = version string (includes ==). +const REQUIREMENT_LINE_RE = /^([A-Za-z0-9][A-Za-z0-9._-]*)==([A-Za-z0-9._+!]+)/ + +const BAZEL_STRING_LABEL_RE = /[@A-Za-z0-9_~/.:+-]+/ + +const ALIAS_ACTUAL_RE = new RegExp( + `actual\\s*=\\s*(["'])(${BAZEL_STRING_LABEL_RE.source})\\1`, +) + +// Skippable line prefixes. +function shouldSkipLine(line: string): boolean { + const trimmed = line.trim() + if (!trimmed) { + return true + } + if (trimmed.startsWith('#')) { + return true + } + // Hash continuations start with `--hash=`. + if (trimmed.startsWith('--hash=')) { + return true + } + // Index options, constraint options, editable installs, includes, direct URLs. + if ( + trimmed.startsWith('--') || + trimmed.startsWith('-e ') || + trimmed.startsWith('-r ') || + trimmed.startsWith('https://') || + trimmed.startsWith('http://') + ) { + return true + } + return false +} + +// Parse a `requirements_lock.txt`-style file into a map keyed by normalized +// PyPI name. This intentionally ignores unpinned PEP 508 requirement forms +// because the Bazel extractor must emit exact package versions. +export function parseRequirementsLock( + text: string, +): Map { + const out = new Map() + const lines = text.split('\n') + for (let i = 0; i < lines.length; i++) { + const rawLine = lines[i] + if (rawLine === undefined) { + continue + } + if (shouldSkipLine(rawLine)) { + continue + } + // Handle trailing backslash continuation by concatenating subsequent lines. + let line = rawLine.trimEnd() + while (line.endsWith('\\') && i + 1 < lines.length) { + i++ + const next = lines[i] + if (next !== undefined) { + line = line.slice(0, -1).trimEnd() + ' ' + next.trimStart() + } + } + const m = REQUIREMENT_LINE_RE.exec(line) + if (!m) { + continue + } + const [, rawName, version] = m + if (!rawName || !version) { + continue + } + const bazelName = rawName.replace(/-/g, '_') + const normalized = normalizePypiName(rawName) + const existing = out.get(normalized) + if (existing) { + if (existing.version !== version) { + throw new Error( + `Conflicting versions for normalized PyPI package ${normalized}: ` + + `${existing.originalLine ?? existing.name + '==' + existing.version} ` + + `conflicts with ${line}.`, + ) + } + continue + } + out.set(normalized, { + name: rawName, + version, + bazelName, + source: 'lockfile', + originalLine: line, + }) + } + return out +} + +// Read and parse a requirements lockfile from a resolved path, capping file +// size. Returns undefined when the file is missing, oversized, or unreadable. +export function readRequirementsLockFile( + resolvedPath: string | undefined, +): Map | undefined { + if (!resolvedPath) { + return undefined + } + if (!existsSync(resolvedPath)) { + return undefined + } + let text: string + try { + const stat = statSync(resolvedPath) + if (stat.size > MAX_REQUIREMENTS_FILE_BYTES) { + return undefined + } + text = readFileSync(resolvedPath, 'utf8') + } catch { + return undefined + } + return parseRequirementsLock(text) +} + +// Extract `pypi_name=` and `pypi_version=` tags from `--output=build` text of a +// spoke target. Returns null when either tag is missing. +const PYPI_NAME_TAG_RE = /pypi_name=\s*([A-Za-z0-9][A-Za-z0-9._-]*)/ +const PYPI_VERSION_TAG_RE = /pypi_version=\s*([A-Za-z0-9._+!]+)/ + +export function parsePypiTagsFromBuildOutput( + text: string, +): ExtractedPypiPackage | null { + const nameM = PYPI_NAME_TAG_RE.exec(text) + const versionM = PYPI_VERSION_TAG_RE.exec(text) + if (!nameM || !versionM) { + return null + } + const rawName = nameM[1] + const version = versionM[1] + if (!rawName || !version) { + return null + } + return { + name: rawName, + version, + bazelName: rawName.replace(/-/g, '_'), + source: 'spoke-tag', + } +} + +export function parseAliasActualFromBuildOutput( + text: string, +): string | undefined { + const match = ALIAS_ACTUAL_RE.exec(text) + return match?.[2] +} + +// Extract hub package labels from `bazel query` output that match +// `@//:pkg` patterns (both line-start and embedded in +// `--output=build` deps arrays). +export function filterReachedPypiPackages( + queryOutput: string, + hubName: string, +): ReachedPypiLabel[] { + const out: ReachedPypiLabel[] = [] + const prefix = `@${hubName}//` + // Match from the start of a label token (preceded by whitespace, quote, or + // start of line) to improve robustness across output formats. + const labelRe = new RegExp( + `(?:^|[\\s"])${prefix.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')}([^\\s:"]+):pkg`, + 'g', + ) + let m: RegExpExecArray | null + while ((m = labelRe.exec(queryOutput)) !== null) { + const pkgPart = m[1] + if (!pkgPart) { + continue + } + const bazelName = pkgPart + const normalized = normalizePypiName(bazelNameToPypiName(bazelName)) + const apparentLabel = `${prefix}${bazelName}:pkg` + out.push({ + hubName, + originalLabel: apparentLabel, + bazelName, + normalizedName: normalized, + apparentLabel, + }) + } + return out +} + +// Collect name==version pairs for the reached closure, resolving versions +// from the lockfile fast path or spoke-tag fallback. Enforces version +// conflict detection and deterministic output. +export function collectPypiPackages( + reached: ReachedPypiLabel[], + lockfile: Map | undefined, + spokeTagLookup: Map | undefined, +): Array<{ name: string; version: string; source: string; label: string }> { + const collected = new Map< + string, + { name: string; version: string; source: string; label: string } + >() + for (const r of reached) { + const normalized = r.normalizedName + // Lockfile fast path. + const lockEntry = lockfile?.get(normalized) + if (lockEntry) { + const existing = collected.get(normalized) + if (existing && existing.version !== lockEntry.version) { + throw new Error( + `Conflicting versions for ${normalized}: ${existing.label} has ${existing.version}, ${r.originalLabel} has ${lockEntry.version} (lockfile).`, + ) + } + if (!existing) { + collected.set(normalized, { + name: lockEntry.name, + version: lockEntry.version, + source: 'lockfile', + label: r.originalLabel, + }) + } + continue + } + // Spoke-tag fallback. + const spokeEntry = spokeTagLookup?.get(normalized) + if (spokeEntry) { + const existing = collected.get(normalized) + if (existing && existing.version !== spokeEntry.version) { + throw new Error( + `Conflicting versions for ${normalized}: ${existing.label} has ${existing.version}, ${r.originalLabel} has ${spokeEntry.version} (spoke tag).`, + ) + } + if (!existing) { + collected.set(normalized, { + name: spokeEntry.name, + version: spokeEntry.version, + source: 'spoke-tag', + label: r.originalLabel, + }) + } + continue + } + // Unresolvable package — fail rather than emit an unpinned entry. + throw new Error( + `No version found for ${r.originalLabel}. ` + + 'Check that the package is present in the requirements_lock.txt ' + + 'or reachable via a spoke target with pypi_name and pypi_version tags.', + ) + } + return Array.from(collected.values()) +} diff --git a/src/commands/manifest/bazel/bazel-pypi-parser.test.mts b/src/commands/manifest/bazel/bazel-pypi-parser.test.mts new file mode 100644 index 000000000..b0a70b7e6 --- /dev/null +++ b/src/commands/manifest/bazel/bazel-pypi-parser.test.mts @@ -0,0 +1,397 @@ +import { describe, expect, it } from 'vitest' + +import { + bazelNameToPypiName, + collectPypiPackages, + filterReachedPypiPackages, + normalizePypiName, + parseAliasActualFromBuildOutput, + parsePypiTagsFromBuildOutput, + parseRequirementsLock, + resolveRequirementsLockPath, +} from './bazel-pypi-parser.mts' + +describe('parseRequirementsLock', () => { + it('parses canonical name==version lines', () => { + const text = 'requests==2.33.1\nnumpy==2.4.4\n' + const result = parseRequirementsLock(text) + expect(result.size).toBe(2) + expect(result.get('requests')).toEqual({ + name: 'requests', + version: '2.33.1', + bazelName: 'requests', + source: 'lockfile', + originalLine: 'requests==2.33.1', + }) + }) + + it('skips comments, empty lines, hash continuations, options', () => { + const text = ` +# comment +requests==2.33.1 +--hash=sha256:abcd +--index-url https://pypi.org/simple +-e git+https://github.com/foo/bar +-r other.txt +https://example.com/pkg.tar.gz + `.trim() + const result = parseRequirementsLock(text) + expect(result.size).toBe(1) + expect(result.has('requests')).toBe(true) + }) + + it('normalizes underscores, dots, and hyphens for membership keys', () => { + const text = + 'charset_normalizer==3.4.7\ntyping-extensions==4.15.0\nSome.Package==1.0.0\n' + const result = parseRequirementsLock(text) + expect(result.get('charset-normalizer')).toBeDefined() + expect(result.get('typing-extensions')).toBeDefined() + expect(result.get('some-package')).toBeDefined() + }) + + it('handles trailing backslash continuation', () => { + const text = 'requests==2.33.1 \\\n --hash=sha256:abc\nnumpy==2.4.4\n' + const result = parseRequirementsLock(text) + expect(result.size).toBe(2) + expect(result.has('requests')).toBe(true) + expect(result.has('numpy')).toBe(true) + }) + + it('returns empty map for empty input', () => { + expect(parseRequirementsLock('').size).toBe(0) + }) + + it('ignores mixed valid and invalid lines', () => { + const text = 'a==1.0.0\nfoo>=1.0\nbar==2.0.0\n' + const result = parseRequirementsLock(text) + expect(result.size).toBe(2) + expect(result.has('a')).toBe(true) + expect(result.has('bar')).toBe(true) + expect(result.has('foo')).toBe(false) + }) + + it('preserves safe originalLine spelling', () => { + const text = 'Foo-Bar==1.0.0\n' + const result = parseRequirementsLock(text) + expect(result.get('foo-bar')).toEqual( + expect.objectContaining({ + name: 'Foo-Bar', + bazelName: 'Foo_Bar', + }), + ) + }) + + it('rejects conflicting duplicate normalized names with original lines', () => { + const text = 'foo-bar==1.0.0\nFoo_Bar==2.0.0\n' + expect(() => parseRequirementsLock(text)).toThrow( + /foo-bar==1\.0\.0 conflicts with Foo_Bar==2\.0\.0/, + ) + }) + + it('keeps the first duplicate normalized name when the version matches', () => { + const result = parseRequirementsLock('foo-bar==1.0.0\nFoo_Bar==1.0.0\n') + expect(result.size).toBe(1) + expect(result.get('foo-bar')?.originalLine).toBe('foo-bar==1.0.0') + }) +}) + +describe('parseAliasActualFromBuildOutput', () => { + it('extracts double-quoted alias actual labels', () => { + expect( + parseAliasActualFromBuildOutput( + 'alias(name = "pkg", actual = "@pypi_requests//:pkg")', + ), + ).toBe('@pypi_requests//:pkg') + }) + + it('extracts single-quoted alias actual labels', () => { + expect( + parseAliasActualFromBuildOutput( + "alias(name = 'pkg', actual = '@pypi_requests//:pkg')", + ), + ).toBe('@pypi_requests//:pkg') + }) + + it('extracts canonical Bzlmod alias actual labels', () => { + expect( + parseAliasActualFromBuildOutput( + 'alias(name = "pkg", actual = "@@rules_python~~pip~pypi_312_requests//:pkg")', + ), + ).toBe('@@rules_python~~pip~pypi_312_requests//:pkg') + }) + + it('returns undefined when no alias actual is present', () => { + expect( + parseAliasActualFromBuildOutput('py_library(name = "pkg")'), + ).toBeUndefined() + }) +}) + +describe('parsePypiTagsFromBuildOutput', () => { + it('extracts pypi_name and pypi_version from tags', () => { + const text = 'tags = ["pypi_name=requests", "pypi_version=2.33.1"]' + const result = parsePypiTagsFromBuildOutput(text) + expect(result).toEqual({ + name: 'requests', + version: '2.33.1', + bazelName: 'requests', + source: 'spoke-tag', + }) + }) + + it('returns null when pypi_name is missing', () => { + const text = 'tags = ["pypi_version=2.33.1"]' + expect(parsePypiTagsFromBuildOutput(text)).toBeNull() + }) + + it('returns null when pypi_version is missing', () => { + const text = 'tags = ["pypi_name=requests"]' + expect(parsePypiTagsFromBuildOutput(text)).toBeNull() + }) + + it('handles extra whitespace around tags', () => { + const text = + 'tags = [ "pypi_name= charset-normalizer" , "pypi_version= 3.4.7" ]' + const result = parsePypiTagsFromBuildOutput(text) + expect(result).not.toBeNull() + expect(result?.name).toBe('charset-normalizer') + }) + + it('extracts one-character package names from tags', () => { + const text = 'tags = ["pypi_name=x", "pypi_version=1.0.0"]' + const result = parsePypiTagsFromBuildOutput(text) + expect(result).toEqual({ + name: 'x', + version: '1.0.0', + bazelName: 'x', + source: 'spoke-tag', + }) + }) +}) + +describe('filterReachedPypiPackages', () => { + it('extracts @pypi//name:pkg labels', () => { + const text = '@pypi//requests:pkg\n@pypi//numpy:pkg\n//local:target\n' + const result = filterReachedPypiPackages(text, 'pypi') + expect(result.length).toBe(2) + expect(result[0]).toEqual({ + hubName: 'pypi', + originalLabel: '@pypi//requests:pkg', + bazelName: 'requests', + normalizedName: 'requests', + apparentLabel: '@pypi//requests:pkg', + }) + }) + + it('ignores non-hub labels', () => { + const text = '//some:local\n@other//thing:pkg\n' + expect(filterReachedPypiPackages(text, 'pypi')).toEqual([]) + }) + + it('handles multiple hubs', () => { + const text = '@pypi//a:pkg\n@my_pip//b:pkg\n' + expect(filterReachedPypiPackages(text, 'pypi').length).toBe(1) + expect(filterReachedPypiPackages(text, 'my_pip').length).toBe(1) + }) + + it('returns empty on empty query output', () => { + expect(filterReachedPypiPackages('', 'pypi')).toEqual([]) + }) + + it('keeps duplicate normalized names for conflict detection', () => { + const text = '@pypi//Foo_Bar:pkg\n@pypi//foo-bar:pkg\n' + const result = filterReachedPypiPackages(text, 'pypi') + expect(result.length).toBe(2) + }) +}) + +describe('bazelNameToPypiName', () => { + it('converts underscores to hyphens', () => { + expect(bazelNameToPypiName('charset_normalizer')).toBe('charset-normalizer') + expect(bazelNameToPypiName('typing_extensions')).toBe('typing-extensions') + }) + + it('leaves already-hyphenated names unchanged', () => { + expect(bazelNameToPypiName('some-package')).toBe('some-package') + }) + + it('leaves names without underscores unchanged', () => { + expect(bazelNameToPypiName('requests')).toBe('requests') + }) +}) + +describe('normalizePypiName', () => { + it('lowercases and collapses dots, underscores, hyphens', () => { + expect(normalizePypiName('Foo.Bar_Baz-Qux')).toBe('foo-bar-baz-qux') + }) + + it('handles PEP 503 case-insensitive comparison', () => { + expect(normalizePypiName('Requests')).toBe('requests') + expect(normalizePypiName('NumPy')).toBe('numpy') + }) +}) + +describe('resolveRequirementsLockPath', () => { + const cwd = '/workspace' + + it('resolves //:requirements_lock.txt to cwd/requirements_lock.txt', () => { + expect(resolveRequirementsLockPath('//:requirements_lock.txt', cwd)).toBe( + '/workspace/requirements_lock.txt', + ) + }) + + it('resolves :requirements_lock.txt to cwd/requirements_lock.txt', () => { + expect(resolveRequirementsLockPath(':requirements_lock.txt', cwd)).toBe( + '/workspace/requirements_lock.txt', + ) + }) + + it('resolves //subdir:requirements_lock.txt to cwd/subdir/requirements_lock.txt', () => { + expect( + resolveRequirementsLockPath('//subdir:requirements_lock.txt', cwd), + ).toBe('/workspace/subdir/requirements_lock.txt') + }) + + it('resolves workspace-relative paths', () => { + expect(resolveRequirementsLockPath('reqs.txt', cwd)).toBe( + '/workspace/reqs.txt', + ) + }) + + it('rejects paths containing ..', () => { + expect( + resolveRequirementsLockPath('//foo/../etc:pass', cwd), + ).toBeUndefined() + }) + + it('rejects absolute paths', () => { + expect(resolveRequirementsLockPath('/etc/passwd', cwd)).toBeUndefined() + }) + + it('rejects external repo labels', () => { + expect(resolveRequirementsLockPath('@repo//path:file', cwd)).toBeUndefined() + }) + + it('returns undefined for undefined label', () => { + expect(resolveRequirementsLockPath(undefined, cwd)).toBeUndefined() + }) +}) + +describe('collectPypiPackages', () => { + it('collects lockfile versions when available', () => { + const lockfile = new Map([ + [ + 'requests', + { + name: 'requests', + version: '2.33.1', + bazelName: 'requests', + source: 'lockfile', + }, + ], + ]) + const reached = [ + { + hubName: 'pypi', + originalLabel: '@pypi//requests:pkg', + bazelName: 'requests', + normalizedName: 'requests', + apparentLabel: '@pypi//requests:pkg', + }, + ] + const result = collectPypiPackages(reached, lockfile, undefined) + expect(result).toEqual([ + { + name: 'requests', + version: '2.33.1', + source: 'lockfile', + label: '@pypi//requests:pkg', + }, + ]) + }) + + it('falls back to spoke tags when lockfile missing', () => { + const spoke = new Map([ + [ + 'numpy', + { + name: 'numpy', + version: '2.4.4', + bazelName: 'numpy', + source: 'spoke-tag', + }, + ], + ]) + const reached = [ + { + hubName: 'pypi', + originalLabel: '@pypi//numpy:pkg', + bazelName: 'numpy', + normalizedName: 'numpy', + apparentLabel: '@pypi//numpy:pkg', + }, + ] + const result = collectPypiPackages(reached, undefined, spoke) + expect(result).toEqual([ + { + name: 'numpy', + version: '2.4.4', + source: 'spoke-tag', + label: '@pypi//numpy:pkg', + }, + ]) + }) + + it('dedups duplicate normalized names with the same version', () => { + const lockfile = new Map([ + [ + 'foo', + { + name: 'foo', + version: '1.0.0', + bazelName: 'foo', + source: 'lockfile', + }, + ], + ]) + const reached = [ + { + hubName: 'pypi', + originalLabel: '@pypi//foo:pkg', + bazelName: 'foo', + normalizedName: 'foo', + apparentLabel: '@pypi//foo:pkg', + }, + { + hubName: 'other', + originalLabel: '@other//Foo:pkg', + bazelName: 'Foo', + normalizedName: 'foo', + apparentLabel: '@other//Foo:pkg', + }, + ] + const result = collectPypiPackages(reached, lockfile, undefined) + expect(result.length).toBe(1) + expect(result[0]).toEqual({ + name: 'foo', + version: '1.0.0', + source: 'lockfile', + label: '@pypi//foo:pkg', + }) + }) + + it('throws when no version source is available', () => { + const reached = [ + { + hubName: 'pypi', + originalLabel: '@pypi//missing:pkg', + bazelName: 'missing', + normalizedName: 'missing', + apparentLabel: '@pypi//missing:pkg', + }, + ] + expect(() => collectPypiPackages(reached, undefined, undefined)).toThrow( + /No version found/, + ) + }) +}) diff --git a/src/commands/manifest/bazel/bazel-query-runner.mts b/src/commands/manifest/bazel/bazel-query-runner.mts index 64f35f884..34300d487 100644 --- a/src/commands/manifest/bazel/bazel-query-runner.mts +++ b/src/commands/manifest/bazel/bazel-query-runner.mts @@ -25,6 +25,8 @@ export type BazelQueryResult = { // Default per-invocation timeout for bazel queries. Bazel cold-cache starts // can take several minutes; 10 minutes is generous while still bounding CI hangs. const BAZEL_QUERY_TIMEOUT_MS = 600_000 +const STDERR_TAIL_BYTES = 4_096 +const STDOUT_EXCERPT_BYTES = 1_024 // Splits the user-supplied --bazel-flags string on whitespace. // Empty / undefined returns []. No shell parsing — quoted args with embedded @@ -49,16 +51,39 @@ function buildBazelModShowVisibleReposArgv(opts: BazelQueryOptions): string[] { return [ ...startup, 'mod', - 'show_repo', - '--all_visible_repos', - '--output=streamed_jsonproto', + 'dump_repo_mapping', + '', + '--output=json', ...userFlags, ] } -function buildBazelArgv(queryStr: string, opts: BazelQueryOptions): string[] { +function buildBazelModShowPipExtensionArgv(opts: BazelQueryOptions): string[] { + const startup: string[] = [] + if (opts.bazelRc) { + startup.push(`--bazelrc=${opts.bazelRc}`) + } + if (opts.bazelOutputBase) { + startup.push(`--output_base=${opts.bazelOutputBase}`) + } + const userFlags = splitBazelFlags(opts.bazelFlags) + return [ + ...startup, + 'mod', + 'show_extension', + '@rules_python//python/extensions:pip.bzl%pip', + '--extension_usages=', + ...userFlags, + ] +} + +function buildBazelArgv( + queryStr: string, + opts: BazelQueryOptions, + output = 'build', +): string[] { // Startup flags MUST precede the `query` subcommand. - // Bazel argv shape: query --output=build + // Bazel argv shape: query --output= const startup: string[] = [] if (opts.bazelRc) { startup.push(`--bazelrc=${opts.bazelRc}`) @@ -75,7 +100,7 @@ function buildBazelArgv(queryStr: string, opts: BazelQueryOptions): string[] { ...queryFlags, ...opts.invocationFlags, queryStr, - '--output=build', + `--output=${output}`, ...userFlags, ] } @@ -88,6 +113,58 @@ function numericExitCode(value: unknown): number | undefined { return typeof value === 'number' && Number.isFinite(value) ? value : undefined } +function byteLength(value: string): number { + return Buffer.byteLength(value, 'utf8') +} + +function excerpt(value: string, maxBytes: number): string { + if (byteLength(value) <= maxBytes) { + return value + } + return value.slice(0, maxBytes) + '\n[truncated]' +} + +function logBazelTrace({ + argv, + durationMs, + opts, + result, + step, +}: { + argv: string[] + durationMs: number + opts: BazelQueryOptions + result: BazelQueryResult + step: string +}): void { + if (!opts.verbose) { + return + } + const stderrBytes = byteLength(result.stderr) + const stdoutBytes = byteLength(result.stdout) + const category = result.code === 0 ? 'ok' : 'bazel-query-failed' + logger.log('[VERBOSE] bazel subprocess trace:', `category=${category}`, { + argv, + category, + code: result.code, + cwd: opts.cwd, + durationMs, + stderrBytes, + stdoutBytes, + step, + timedOut: false, + timeoutMs: BAZEL_QUERY_TIMEOUT_MS, + }) + if (result.code !== 0 && result.stderr) { + logger.log( + '[VERBOSE] bazel stderr tail:', + excerpt(result.stderr.slice(-STDERR_TAIL_BYTES), STDERR_TAIL_BYTES), + ) + } else if (result.stdout && stdoutBytes <= STDOUT_EXCERPT_BYTES) { + logger.log('[VERBOSE] bazel stdout excerpt:', result.stdout) + } +} + function normalizeSpawnError(error: unknown): BazelQueryResult { const e = error as { code?: unknown @@ -111,11 +188,13 @@ function normalizeSpawnError(error: unknown): BazelQueryResult { export async function runBazelQuery( queryStr: string, opts: BazelQueryOptions, + output?: string, ): Promise { - const argv = buildBazelArgv(queryStr, opts) + const argv = buildBazelArgv(queryStr, opts, output) if (opts.verbose) { logger.log('[VERBOSE] Executing:', opts.bin, ', args:', argv) } + const startedAt = Date.now() const { spinner } = constants let result: BazelQueryResult | undefined try { @@ -138,13 +217,22 @@ export async function runBazelQuery( } else { spinner.failAndStop(`bazel query failed (${truncated}).`) } + if (result) { + logBazelTrace({ + argv, + durationMs: Date.now() - startedAt, + opts, + result, + step: `bazel query ${truncated}`, + }) + } } } /** * Bzlmod-native visible repository enumeration. This is only a candidate * source; callers must still validate each returned apparent repo name with a - * semantic query for generated JVM Maven rules. + * semantic query for generated ecosystem rules. */ export async function runBazelModShowVisibleRepos( opts: BazelQueryOptions, @@ -153,6 +241,43 @@ export async function runBazelModShowVisibleRepos( if (opts.verbose) { logger.log('[VERBOSE] Executing:', opts.bin, ', args:', argv) } + const startedAt = Date.now() + let result: BazelQueryResult + try { + const output = await spawn(opts.bin, argv, { + cwd: opts.cwd, + timeout: BAZEL_QUERY_TIMEOUT_MS, + ...(opts.env ? { env: opts.env } : {}), + }) + const { code, stderr, stdout } = output + result = { code, stdout, stderr } + } catch (e) { + result = normalizeSpawnError(e) + } + logBazelTrace({ + argv, + durationMs: Date.now() - startedAt, + opts, + result, + step: 'bazel mod dump_repo_mapping', + }) + return result +} + +/** + * Bzlmod-native rules_python pip extension usage inspection. This is the + * authoritative source for root-module pip.parse metadata when Bazel supports + * the command; callers keep bounded static parsing as fallback. + */ +export async function runBazelModShowPipExtension( + opts: BazelQueryOptions, +): Promise { + const argv = buildBazelModShowPipExtensionArgv(opts) + if (opts.verbose) { + logger.log('[VERBOSE] Executing:', opts.bin, ', args:', argv) + } + const startedAt = Date.now() + let result: BazelQueryResult try { const output = await spawn(opts.bin, argv, { cwd: opts.cwd, @@ -160,10 +285,18 @@ export async function runBazelModShowVisibleRepos( ...(opts.env ? { env: opts.env } : {}), }) const { code, stderr, stdout } = output - return { code, stdout, stderr } + result = { code, stdout, stderr } } catch (e) { - return normalizeSpawnError(e) + result = normalizeSpawnError(e) } + logBazelTrace({ + argv, + durationMs: Date.now() - startedAt, + opts, + result, + step: 'bazel mod show_extension rules_python pip', + }) + return result } /** @@ -178,3 +311,18 @@ export function buildProbeFor(opts: BazelQueryOptions): RepoProbe { return { stdout: result.stdout, code: result.code } } } + +/** + * Build a `RepoProbe` for validating pip hub candidates. + * Queries the hub for package targets (e.g. `@//...`) and returns + * stdout so the caller can check for `:pkg` labels or alias rules. + * Does NOT require `pypi_name=` tags in the hub output, because those + * tags live on spoke repos, not the hub alias layer. + */ +export function buildPypiProbeFor(opts: BazelQueryOptions): RepoProbe { + return async (hubName: string) => { + const queryStr = `@${hubName}//...` + const result = await runBazelQuery(queryStr, opts) + return { stdout: result.stdout, code: result.code } + } +} diff --git a/src/commands/manifest/bazel/bazel-query-runner.test.mts b/src/commands/manifest/bazel/bazel-query-runner.test.mts index fcb0d3680..15cd2411f 100644 --- a/src/commands/manifest/bazel/bazel-query-runner.test.mts +++ b/src/commands/manifest/bazel/bazel-query-runner.test.mts @@ -15,9 +15,16 @@ vi.mock('../../../constants.mts', () => ({ }, })) +import { logger } from '@socketsecurity/registry/lib/logger' import { spawn } from '@socketsecurity/registry/lib/spawn' -import { buildProbeFor, runBazelQuery } from './bazel-query-runner.mts' +import { + buildProbeFor, + buildPypiProbeFor, + runBazelModShowPipExtension, + runBazelModShowVisibleRepos, + runBazelQuery, +} from './bazel-query-runner.mts' import constants from '../../../constants.mts' describe('runBazelQuery', () => { @@ -186,6 +193,77 @@ describe('runBazelQuery', () => { }) expect(r).toEqual({ code: -1, stdout: '', stderr: 'missing bazel' }) }) + + it('emits bounded subprocess trace when verbose is true', async () => { + const logSpy = vi.spyOn(logger, 'log').mockImplementation(() => logger) + try { + // @ts-ignore — narrow return shape for the test's purposes. + mocked.mockResolvedValueOnce({ code: 7, stdout: 'OUT', stderr: 'ERR' }) + await runBazelQuery('q', { + bin: 'bazel', + cwd: '/r', + invocationFlags: [], + verbose: true, + }) + const text = logSpy.mock.calls + .map(args => args.map(a => String(a)).join(' ')) + .join('\n') + expect(text).toContain('bazel subprocess trace') + expect(text).toContain('bazel stderr tail') + expect(text).toContain('bazel-query-failed') + } finally { + logSpy.mockRestore() + } + }) +}) + +describe('runBazelModShowVisibleRepos', () => { + const mocked = vi.mocked(spawn) + + beforeEach(() => { + mocked.mockReset() + // @ts-ignore — narrow return shape for the test's purposes. + mocked.mockResolvedValue({ code: 0, stdout: '{}', stderr: '' }) + }) + + it('uses the Bazel 7-compatible root repo mapping command', async () => { + await runBazelModShowVisibleRepos({ + bin: 'bazel', + cwd: '/repo', + invocationFlags: [], + }) + + const argv = mocked.mock.calls[0]![1] as string[] + expect(argv).toEqual(['mod', 'dump_repo_mapping', '', '--output=json']) + expect(argv).not.toContain('--all_visible_repos') + expect(argv).not.toContain('--output=streamed_jsonproto') + }) +}) + +describe('runBazelModShowPipExtension', () => { + const mocked = vi.mocked(spawn) + + beforeEach(() => { + mocked.mockReset() + // @ts-ignore — narrow return shape for the test's purposes. + mocked.mockResolvedValue({ code: 0, stdout: 'pip.parse()', stderr: '' }) + }) + + it('uses the rules_python pip extension usage command', async () => { + await runBazelModShowPipExtension({ + bin: 'bazel', + cwd: '/repo', + invocationFlags: [], + }) + + const argv = mocked.mock.calls[0]![1] as string[] + expect(argv).toEqual([ + 'mod', + 'show_extension', + '@rules_python//python/extensions:pip.bzl%pip', + '--extension_usages=', + ]) + }) }) describe('buildProbeFor', () => { @@ -218,3 +296,50 @@ describe('buildProbeFor', () => { }) }) }) + +describe('buildPypiProbeFor', () => { + const mocked = vi.mocked(spawn) + + beforeEach(() => { + mocked.mockReset() + // @ts-ignore — narrow return shape for the test's purposes. + mocked.mockResolvedValue({ + code: 0, + stdout: '@pypi//requests:pkg\n@pypi//flask:pkg\n', + stderr: '', + }) + }) + + it('builds a hub-wide query for a pip hub name', async () => { + const probe = buildPypiProbeFor({ + bin: 'bazel', + cwd: '/r', + invocationFlags: [], + }) + const result = await probe('pypi') + const argv = mocked.mock.calls[0]![1] as string[] + expect(argv).toContain('@pypi//...') + expect(result).toEqual({ + stdout: expect.stringContaining('@pypi//requests:pkg'), + code: 0, + }) + }) + + it('returns non-zero code when the hub has no :pkg targets', async () => { + mocked.mockReset() + // @ts-ignore — narrow return shape for the test's purposes. + mocked.mockResolvedValue({ + code: 0, + stdout: '', + stderr: '', + }) + const probe = buildPypiProbeFor({ + bin: 'bazel', + cwd: '/r', + invocationFlags: [], + }) + const result = await probe('empty_hub') + expect(result.code).toBe(0) + expect(result.stdout).toBe('') + }) +}) diff --git a/src/commands/manifest/bazel/bazel-repo-discovery.mts b/src/commands/manifest/bazel/bazel-repo-discovery.mts index 7374a432c..8d13542a3 100644 --- a/src/commands/manifest/bazel/bazel-repo-discovery.mts +++ b/src/commands/manifest/bazel/bazel-repo-discovery.mts @@ -109,14 +109,30 @@ function apparentNameFromJsonValue(value: unknown): string | undefined { return undefined } +function apparentNamesFromRepoMapping(value: unknown): string[] { + if (!value || typeof value !== 'object' || Array.isArray(value)) { + return [] + } + const candidates: string[] = [] + for (const [name, canonicalName] of Object.entries(value)) { + if (name.startsWith('@') || typeof canonicalName !== 'string') { + continue + } + if (BAZEL_REPO_NAME_RE.test(name)) { + candidates.push(name) + } + } + return candidates +} + function normalizeRepoName(name: string): string | undefined { const repo = name.startsWith('@') ? name.slice(1) : name return BAZEL_REPO_NAME_RE.test(repo) ? repo : undefined } -// Parse `bazel mod show_repo --all_visible_repos --output=streamed_jsonproto` -// output. Bazel's JSON proto field casing may vary by formatter; accept both -// lowerCamel and snake_case, and tolerate wrapper objects around Repository. +// Parse `bazel mod dump_repo_mapping "" --output=json` output. Also accept the +// older streamed jsonproto shape in case older Bazel versions or fixtures still +// return repository records with apparentName fields. export function parseVisibleRepoCandidates(output: string): string[] { const candidates: string[] = [] for (const line of output.split(/\r?\n/)) { @@ -126,6 +142,7 @@ export function parseVisibleRepoCandidates(output: string): string[] { } try { const parsed = JSON.parse(trimmed) as unknown + candidates.push(...apparentNamesFromRepoMapping(parsed)) const apparentName = apparentNameFromJsonValue(parsed) if (apparentName) { const repo = normalizeRepoName(apparentName) diff --git a/src/commands/manifest/bazel/bazel-repo-discovery.test.mts b/src/commands/manifest/bazel/bazel-repo-discovery.test.mts index 12d8a9a86..5755388df 100644 --- a/src/commands/manifest/bazel/bazel-repo-discovery.test.mts +++ b/src/commands/manifest/bazel/bazel-repo-discovery.test.mts @@ -120,6 +120,26 @@ describe('bazel-repo-discovery', () => { }) describe('parseVisibleRepoCandidates', () => { + it('parses apparent repo names from dump_repo_mapping JSON output', () => { + const output = JSON.stringify({ + '': '', + '@invalid': 'canonical-invalid', + bazel_tools: 'bazel_tools', + maven: 'rules_jvm_external~~maven~maven', + 'maven-prod': 'rules_jvm_external~~maven~prod', + pypi: 'rules_python~~pip~pypi', + 'third.party.maven': 'rules_jvm_external~~maven~third_party', + }) + + expect(parseVisibleRepoCandidates(output)).toEqual([ + 'bazel_tools', + 'maven', + 'maven-prod', + 'pypi', + 'third.party.maven', + ]) + }) + it('parses apparent repo names from streamed jsonproto output', () => { const output = [ JSON.stringify({ diff --git a/src/commands/manifest/bazel/cmd-manifest-bazel.mts b/src/commands/manifest/bazel/cmd-manifest-bazel.mts index 3f5f99135..f07372789 100644 --- a/src/commands/manifest/bazel/cmd-manifest-bazel.mts +++ b/src/commands/manifest/bazel/cmd-manifest-bazel.mts @@ -4,9 +4,11 @@ import { debugFn } from '@socketsecurity/registry/lib/debug' import { logger } from '@socketsecurity/registry/lib/logger' import { extractBazelToMaven } from './extract_bazel_to_maven.mts' +import { extractBazelToPypi } from './extract_bazel_to_pypi.mts' import constants, { SOCKET_JSON } from '../../../constants.mts' import { commonFlags } from '../../../flags.mts' import { checkCommandInput } from '../../../utils/check-input.mts' +import { InputError } from '../../../utils/errors.mts' import { getOutputKind } from '../../../utils/get-output-kind.mts' import { meowOrExit } from '../../../utils/meow-with-subcommands.mts' import { getFlagListOutput } from '../../../utils/output-formatting.mts' @@ -20,7 +22,7 @@ import type { const config: CliCommandConfig = { commandName: 'bazel', description: - '[beta] Bazel JVM SBOM support — generate manifest files (`maven_install.json`) for a Bazel/Maven project', + '[beta] Bazel SBOM support — generate manifest files for a Bazel project (Maven, PyPI)', hidden: false, flags: { ...commonFlags, @@ -36,13 +38,18 @@ const config: CliCommandConfig = { }, bazelOutputBase: { type: 'string', - description: - 'Bazel --output_base for read-only-cache CI environments', + description: 'Bazel --output_base for read-only-cache CI environments', }, bazelRc: { type: 'string', description: 'Path to additional .bazelrc fragments forwarded to bazel', }, + ecosystem: { + type: 'string', + isMultiple: true, + description: + 'Ecosystem(s) to extract; repeatable. Supported: maven, pypi. Default: maven.', + }, out: { type: 'string', description: @@ -50,7 +57,8 @@ const config: CliCommandConfig = { }, verbose: { type: 'boolean', - description: 'Stream bazel stdout/stderr', + description: + 'Emit bounded Bazel diagnostics with argv, duration, exit status, and output sizes', }, }, help: (command, config) => ` @@ -60,19 +68,26 @@ const config: CliCommandConfig = { Options ${getFlagListOutput(config.flags)} - [beta] Generates Bazel JVM SBOM manifests (\`maven_install.json\`-shaped) - by running \`bazel query\` against discovered Maven repos. Output is - consumed by \`socket scan create\`'s server-side parser. + [beta] Generates Bazel SBOM manifests for Maven (\`maven_install.json\`) + by running \`bazel query\` against discovered dependency repos. + PyPI requirements generation is available with \`--ecosystem pypi\`. + Output is consumed by + \`socket scan create\`'s server-side parser. + + --ecosystem may be repeated to select which ecosystems to extract. + When omitted, Maven is generated by default. PyPI is explicit opt-in. - Note: this command generates Maven dependency manifests for Bazel JVM - workspaces. It does not run reachability analysis. + Note: this command generates dependency manifests for Bazel workspaces. + It does not run reachability analysis. To generate AND upload in one step, use \`socket scan create --auto-manifest\` - instead — it detects Bazel workspaces, runs the same extraction, and uploads - the result. This subcommand is for generation only. + instead — it detects Bazel workspaces, generates Maven manifests by + default, and uploads the result. This subcommand is for generation only. Examples $ ${command} . + $ ${command} --ecosystem pypi . + $ ${command} --ecosystem maven --ecosystem pypi . $ ${command} --bazel=/usr/local/bin/bazelisk . `, } @@ -83,6 +98,63 @@ export const cmdManifestBazel = { run, } +export type EcosystemOutcome = { + ecosystem: 'maven' | 'pypi' + ok: boolean + noEcosystemFound?: boolean | undefined + hardFailure?: boolean + manifestPath?: string | undefined +} + +// Pure outcome-matrix evaluator. Exported so dispatcher behavior can be +// unit-tested without spawning the CLI binary. Throws InputError on +// failures that must propagate to a non-zero CLI exit; returns void on +// success. +// +// - Hard failure: ok === false && !noEcosystemFound. The ecosystem was +// detected (or the runner crashed), but extraction failed. Always a +// non-zero exit, even when another ecosystem succeeded. +// - No-discovery: noEcosystemFound === true. Genuinely absent ecosystem. +// Auto-detect mode tolerates this when at least one other ecosystem +// succeeded; explicit mode treats it as an error. +export function evaluateEcosystemOutcomes( + outcomes: readonly EcosystemOutcome[], + isExplicit: boolean, +): void { + const hardFailures = outcomes.filter(o => !o.ok && !o.noEcosystemFound) + const noDiscoveries = outcomes.filter(o => o.noEcosystemFound) + const successes = outcomes.filter(o => o.ok && o.manifestPath) + + if (!isExplicit) { + if (hardFailures.length) { + throw new InputError( + `Bazel auto-manifest generation hit hard failure(s) in ecosystem(s): ${hardFailures.map(f => f.ecosystem).join(', ')}.`, + ) + } + if (successes.length) { + return + } + if (noDiscoveries.length === outcomes.length) { + throw new InputError( + 'No supported Bazel ecosystems detected (maven, pypi). Ensure rules_jvm_external, rules_python pip_parse/pip_install/pip_repository, or pip.parse is configured.', + ) + } + return + } + + // Explicit mode: every requested ecosystem must succeed. + if (noDiscoveries.length) { + throw new InputError( + `No Bazel rules found for explicitly requested ecosystem(s): ${noDiscoveries.map(f => f.ecosystem).join(', ')}.`, + ) + } + if (hardFailures.length) { + throw new InputError( + `Bazel manifest generation failed for explicitly requested ecosystem(s): ${hardFailures.map(f => f.ecosystem).join(', ')}.`, + ) + } +} + async function run( argv: string[] | readonly string[], importMeta: ImportMeta, @@ -115,6 +187,7 @@ async function run( sockJson?.defaults?.manifest?.bazel, ) + const { ecosystem } = cli.flags let { bazel, bazelFlags, bazelOutputBase, bazelRc, out, verbose } = cli.flags // Set defaults for any flag/arg that is not given. Check socket.json first. @@ -203,13 +276,62 @@ async function run( return } - await extractBazelToMaven({ - bazelFlags: bazelFlags as string | undefined, - bazelOutputBase: bazelOutputBase as string | undefined, - bazelRc: bazelRc as string | undefined, - bin: bazel as string | undefined, - cwd, - out: out as string, - verbose: Boolean(verbose), - }) + // Ecosystem dispatch: Maven is the default. PyPI is explicit opt-in because + // its no-lockfile recovery value is narrower than Maven's inline-decl path. + const wasExplicitEcosystemSelection = + Array.isArray(ecosystem) && ecosystem.length > 0 + const ecosystems: string[] = wasExplicitEcosystemSelection + ? (ecosystem as string[]) + : ['maven'] + + for (const eco of ecosystems) { + if (!['maven', 'pypi'].includes(eco)) { + throw new InputError( + `Unsupported --ecosystem value: ${eco}. Supported values: maven, pypi.`, + ) + } + } + + const outcomes: EcosystemOutcome[] = [] + + for (const eco of ecosystems) { + if (eco === 'maven') { + // eslint-disable-next-line no-await-in-loop + const mavenResult = await extractBazelToMaven({ + bazelFlags: bazelFlags as string | undefined, + bazelOutputBase: bazelOutputBase as string | undefined, + bazelRc: bazelRc as string | undefined, + bin: bazel as string | undefined, + cwd, + out: out as string, + verbose: Boolean(verbose), + }) + outcomes.push({ + ecosystem: 'maven', + ok: mavenResult.ok, + noEcosystemFound: mavenResult.noEcosystemFound, + manifestPath: mavenResult.manifestPath, + }) + } else if (eco === 'pypi') { + // eslint-disable-next-line no-await-in-loop + const pypiResult = await extractBazelToPypi({ + bazelFlags: bazelFlags as string | undefined, + bazelOutputBase: bazelOutputBase as string | undefined, + bazelRc: bazelRc as string | undefined, + bin: bazel as string | undefined, + cwd, + out: out as string, + verbose: Boolean(verbose), + explicitEcosystem: wasExplicitEcosystemSelection, + }) + outcomes.push({ + ecosystem: 'pypi', + ok: pypiResult.ok, + noEcosystemFound: pypiResult.noEcosystemFound, + manifestPath: pypiResult.manifestPath, + }) + } + } + + evaluateEcosystemOutcomes(outcomes, wasExplicitEcosystemSelection) } diff --git a/src/commands/manifest/bazel/cmd-manifest-bazel.test.mts b/src/commands/manifest/bazel/cmd-manifest-bazel.test.mts index 55f12a423..27cca0aab 100644 --- a/src/commands/manifest/bazel/cmd-manifest-bazel.test.mts +++ b/src/commands/manifest/bazel/cmd-manifest-bazel.test.mts @@ -1,11 +1,14 @@ -import { describe, expect } from 'vitest' +import { describe, expect, it } from 'vitest' +import { evaluateEcosystemOutcomes } from './cmd-manifest-bazel.mts' import constants, { FLAG_CONFIG, FLAG_DRY_RUN, } from '../../../../src/constants.mts' import { cmdit, spawnSocketCli } from '../../../../test/utils.mts' +import type { EcosystemOutcome } from './cmd-manifest-bazel.mts' + describe('socket manifest bazel', async () => { const { binCliPath } = constants @@ -17,4 +20,181 @@ describe('socket manifest bazel', async () => { expect(code, 'dry-run should exit with code 0').toBe(0) }, ) + + cmdit( + [ + 'manifest', + 'bazel', + '--ecosystem', + 'pypi', + FLAG_DRY_RUN, + FLAG_CONFIG, + '{}', + ], + 'should accept --ecosystem pypi with dry-run', + async cmd => { + const { code } = await spawnSocketCli(binCliPath, cmd) + expect( + code, + 'dry-run with --ecosystem pypi should exit with code 0', + ).toBe(0) + }, + ) + + cmdit( + [ + 'manifest', + 'bazel', + '--ecosystem', + 'maven', + '--ecosystem', + 'pypi', + FLAG_DRY_RUN, + FLAG_CONFIG, + '{}', + ], + 'should accept repeatable --ecosystem with dry-run', + async cmd => { + const { code } = await spawnSocketCli(binCliPath, cmd) + expect( + code, + 'dry-run with repeatable --ecosystem should exit with code 0', + ).toBe(0) + }, + ) +}) + +const auto = (outcomes: EcosystemOutcome[]) => + evaluateEcosystemOutcomes(outcomes, false) + +describe('evaluateEcosystemOutcomes (auto-detect mode)', () => { + it('returns void when at least one ecosystem succeeds and none hard-failed', () => { + expect(() => + auto([ + { + ecosystem: 'maven', + ok: true, + manifestPath: '/tmp/maven_install.json', + }, + { ecosystem: 'pypi', ok: false, noEcosystemFound: true }, + ]), + ).not.toThrow() + }) + + it('tolerates absent Maven when PyPI succeeds in auto mode', () => { + expect(() => + auto([ + { ecosystem: 'maven', ok: false, noEcosystemFound: true }, + { + ecosystem: 'pypi', + ok: true, + manifestPath: '/tmp/requirements.txt', + }, + ]), + ).not.toThrow() + }) + + it('throws when a hard failure occurs even if another ecosystem succeeded', () => { + expect(() => + auto([ + { + ecosystem: 'maven', + ok: true, + manifestPath: '/tmp/maven_install.json', + }, + { ecosystem: 'pypi', ok: false, noEcosystemFound: false }, + ]), + ).toThrowError(/hard failure\(s\) in ecosystem\(s\): pypi/) + }) + + it('throws when no ecosystem was detected at all', () => { + expect(() => + auto([ + { ecosystem: 'maven', ok: false, noEcosystemFound: true }, + { ecosystem: 'pypi', ok: false, noEcosystemFound: true }, + ]), + ).toThrowError(/No supported Bazel ecosystems detected/) + }) + + it('throws when every attempted ecosystem hard-failed', () => { + expect(() => + auto([ + { ecosystem: 'maven', ok: false, noEcosystemFound: false }, + { ecosystem: 'pypi', ok: false, noEcosystemFound: false }, + ]), + ).toThrowError(/hard failure\(s\) in ecosystem\(s\): maven, pypi/) + }) + + it('supports Maven-only default auto mode', () => { + expect(() => + auto([ + { + ecosystem: 'maven', + ok: true, + manifestPath: '/tmp/maven_install.json', + }, + ]), + ).not.toThrow() + }) +}) + +const explicit = (outcomes: EcosystemOutcome[]) => + evaluateEcosystemOutcomes(outcomes, true) + +describe('evaluateEcosystemOutcomes (explicit mode)', () => { + it('returns void when every requested ecosystem succeeded', () => { + expect(() => + explicit([ + { + ecosystem: 'maven', + ok: true, + manifestPath: '/tmp/maven_install.json', + }, + { + ecosystem: 'pypi', + ok: true, + manifestPath: '/tmp/requirements.txt', + }, + ]), + ).not.toThrow() + }) + + it('throws InputError when a requested ecosystem reports noEcosystemFound', () => { + expect(() => + explicit([{ ecosystem: 'pypi', ok: false, noEcosystemFound: true }]), + ).toThrowError( + /No Bazel rules found for explicitly requested ecosystem\(s\): pypi/, + ) + }) + + it('throws InputError when a requested ecosystem hard-failed (Maven only)', () => { + expect(() => + explicit([{ ecosystem: 'maven', ok: false, noEcosystemFound: false }]), + ).toThrowError( + /Bazel manifest generation failed for explicitly requested ecosystem\(s\): maven/, + ) + }) + + it('throws InputError when explicitly requested Maven is absent', () => { + expect(() => + explicit([{ ecosystem: 'maven', ok: false, noEcosystemFound: true }]), + ).toThrowError( + /No Bazel rules found for explicitly requested ecosystem\(s\): maven/, + ) + }) + + it('throws when Maven hard-fails even if pypi succeeded', () => { + expect(() => + explicit([ + { ecosystem: 'maven', ok: false, noEcosystemFound: false }, + { + ecosystem: 'pypi', + ok: true, + manifestPath: '/tmp/requirements.txt', + }, + ]), + ).toThrowError( + /Bazel manifest generation failed for explicitly requested ecosystem\(s\): maven/, + ) + }) }) diff --git a/src/commands/manifest/bazel/extract_bazel_to_maven.mts b/src/commands/manifest/bazel/extract_bazel_to_maven.mts index 3ba0bf53d..334b116db 100644 --- a/src/commands/manifest/bazel/extract_bazel_to_maven.mts +++ b/src/commands/manifest/bazel/extract_bazel_to_maven.mts @@ -51,6 +51,7 @@ export type ExtractBazelOptions = { export type ExtractBazelResult = { artifactCount: number manifestPath?: string | undefined + noEcosystemFound?: boolean | undefined ok: boolean } @@ -460,9 +461,27 @@ export async function extractBazelToMaven( } if (!allArtifacts.length) { - process.exitCode = 1 - logger.fail('No Maven artifacts extracted. See warnings above.') - return { artifactCount: 0, manifestPath, ok: false } + if (!repos.size) { + if (verbose) { + logger.info( + 'No Maven artifacts extracted. failureCategory=no-supported-ecosystem', + ) + } + return { + artifactCount: 0, + manifestPath, + noEcosystemFound: true, + ok: false, + } + } + logger.fail( + `Discovered Maven repo(s) ${repoNames.join(', ')} but extracted zero artifacts. failureCategory=ecosystem-detected-but-empty`, + ) + return { + artifactCount: 0, + manifestPath, + ok: false, + } } logger.success( `Wrote ${allArtifacts.length} artifact(s) to ${path.relative(cwd, manifestPath)}.`, @@ -473,7 +492,6 @@ export async function extractBazelToMaven( ok: true, } } catch (e) { - process.exitCode = 1 // Always surface the error message; users should not have to // re-run a multi-minute bazel build with --verbose just to see whether // the failure was a missing dependency, permission error, or network blip. diff --git a/src/commands/manifest/bazel/extract_bazel_to_maven.test.mts b/src/commands/manifest/bazel/extract_bazel_to_maven.test.mts index 1da63df0c..4d43c1da5 100644 --- a/src/commands/manifest/bazel/extract_bazel_to_maven.test.mts +++ b/src/commands/manifest/bazel/extract_bazel_to_maven.test.mts @@ -268,7 +268,7 @@ describe('extractBazelToMaven', () => { expect(result.ok).toBe(true) }) - it('sets process.exitCode = 1 and writes empty maven_install.json when no repos discovered', async () => { + it('reports noEcosystemFound without mutating process.exitCode when no repos discovered', async () => { vi.mocked(discoverMavenRepos).mockResolvedValue(new Map()) const result = await extractBazelToMaven({ @@ -281,10 +281,11 @@ describe('extractBazelToMaven', () => { verbose: false, }) - expect(process.exitCode).toBe(1) + expect(process.exitCode).toBe(0) expect(result).toEqual({ artifactCount: 0, manifestPath: path.join(tmp, 'maven_install.json'), + noEcosystemFound: true, ok: false, }) // Empty manifest is still written. @@ -296,6 +297,29 @@ describe('extractBazelToMaven', () => { expect(manifest.artifacts).toEqual({}) }) + it('reports hard failure when discovered repos extract zero artifacts', async () => { + vi.mocked(discoverMavenRepos).mockResolvedValue( + new Map([['maven', '# no parseable rules\n']]), + ) + + const result = await extractBazelToMaven({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: tmp, + out: tmp, + verbose: false, + }) + + expect(result).toEqual({ + artifactCount: 0, + manifestPath: path.join(tmp, 'maven_install.json'), + ok: false, + }) + expect(result.noEcosystemFound).toBeUndefined() + }) + it('iterates each discovered repo independently when one has no parseable rules', async () => { const sample = readFileSync( path.join(FIXTURES, 'jvm-import-sample.txt'), @@ -334,7 +358,7 @@ describe('extractBazelToMaven', () => { }) }) - it('sets process.exitCode = 1 when one group:artifact has conflicting versions', async () => { + it('returns failure without mutating process.exitCode when one group:artifact has conflicting versions', async () => { const conflictingStdout = [ 'jvm_import(', ' name = "com_example_demo_v1",', @@ -359,7 +383,7 @@ describe('extractBazelToMaven', () => { verbose: false, }) - expect(process.exitCode).toBe(1) + expect(process.exitCode).toBe(0) expect(result).toEqual({ artifactCount: 0, ok: false, diff --git a/src/commands/manifest/bazel/extract_bazel_to_pypi.constructed.test.mts b/src/commands/manifest/bazel/extract_bazel_to_pypi.constructed.test.mts new file mode 100644 index 000000000..7687ae121 --- /dev/null +++ b/src/commands/manifest/bazel/extract_bazel_to_pypi.constructed.test.mts @@ -0,0 +1,127 @@ +import { existsSync, mkdtempSync, readFileSync, rmSync } from 'node:fs' +import os from 'node:os' +import path from 'node:path' + +import { afterEach, beforeEach, describe, expect, it } from 'vitest' + +import { extractBazelToPypi } from './extract_bazel_to_pypi.mts' + +const FIXTURE_DIR = path.resolve( + import.meta.dirname, + '..', + '..', + '..', + '..', + '..', + 'bazel-bench', + 'constructed', + 'python-pypi', +) + +function isSandboxed(): boolean { + // Detect sandbox by probing a Bazel server socket bind or a write to + // /var/tmp/_bazel_$USER (both blocked in the agent sandbox). + try { + // A quick heuristic: if /var/tmp/_bazel_$USER is not writable and we're + // on macOS, the sandbox is likely active. + const { accessSync, constants } = require('node:fs') + accessSync( + `/var/tmp/_bazel_${process.env['USER'] ?? 'unknown'}`, + constants.W_OK, + ) + return false + } catch { + return true + } +} + +function normalizeFinalNewline(text: string): string { + return text.replace(/\r\n/g, '\n').replace(/\n?$/, '\n') +} + +describe.skipIf(isSandboxed())( + 'extract_bazel_to_pypi — constructed fixture', + () => { + let tmp: string + + beforeEach(() => { + tmp = mkdtempSync(path.join(os.tmpdir(), 'pypi-constructed-')) + }) + + afterEach(() => { + rmSync(tmp, { recursive: true, force: true }) + }) + + it('produces exact requirements.txt matching the committed oracle', async () => { + expect(existsSync(FIXTURE_DIR)).toBe(true) + + const result = await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: FIXTURE_DIR, + out: tmp, + verbose: true, + }) + + expect(result.ok).toBe(true) + expect(result.manifestPath).toBeDefined() + expect(existsSync(result.manifestPath!)).toBe(true) + + const actualContent = normalizeFinalNewline( + readFileSync(result.manifestPath!, 'utf8'), + ) + const actualLines = actualContent.split('\n').filter(l => l.trim() !== '') + + const oraclePath = path.resolve( + import.meta.dirname, + '..', + '..', + '..', + '..', + 'test', + 'fixtures', + 'manifest-bazel', + 'python-pypi', + 'requirements.expected.txt', + ) + const expectedContent = normalizeFinalNewline( + readFileSync(oraclePath, 'utf8'), + ) + expect(actualContent).toBe(expectedContent) + + // Verify sorted order (sort by package name only, matching sortPackageLines). + const sorted = [...actualLines].sort((a, b) => { + const aName = a.split('==')[0]!.toLowerCase() + const bName = b.split('==')[0]!.toLowerCase() + if (aName < bName) { + return -1 + } + if (aName > bName) { + return 1 + } + return a.localeCompare(b) + }) + expect(actualLines).toEqual(sorted) + }, 60000) + + it('explicit --ecosystem pypi mode also produces matching output', async () => { + expect(existsSync(FIXTURE_DIR)).toBe(true) + + const result = await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: FIXTURE_DIR, + out: tmp, + verbose: true, + explicitEcosystem: true, + }) + + expect(result.ok).toBe(true) + expect(result.manifestPath).toBeDefined() + }, 60000) + }, +) diff --git a/src/commands/manifest/bazel/extract_bazel_to_pypi.mts b/src/commands/manifest/bazel/extract_bazel_to_pypi.mts new file mode 100644 index 000000000..c23f4fe6b --- /dev/null +++ b/src/commands/manifest/bazel/extract_bazel_to_pypi.mts @@ -0,0 +1,402 @@ +import { existsSync, promises as fs, mkdirSync } from 'node:fs' +import path from 'node:path' + +import { logger } from '@socketsecurity/registry/lib/logger' + +import { resolveBazelBinary } from './bazel-bin-detect.mts' +import { validateOutputBase } from './bazel-output-base-check.mts' +import { + discoverPypiHubs, + parseBazelModPipExtensionCandidates, +} from './bazel-pypi-discovery.mts' +import { + collectPypiPackages, + filterReachedPypiPackages, + normalizePypiName, + parseAliasActualFromBuildOutput, + parsePypiTagsFromBuildOutput, + readRequirementsLockFile, + resolveRequirementsLockPath, +} from './bazel-pypi-parser.mts' +import { provisionPythonShim } from './bazel-python-shim.mts' +import { + buildPypiProbeFor, + runBazelModShowPipExtension, + runBazelModShowVisibleRepos, + runBazelQuery, +} from './bazel-query-runner.mts' +import { parseVisibleRepoCandidates } from './bazel-repo-discovery.mts' +import { + detectWorkspaceMode, + getBazelInvocationFlags, +} from './bazel-workspace-detect.mts' +import { getErrorCause } from '../../../utils/errors.mts' + +import type { PypiHubCandidate } from './bazel-pypi-discovery.mts' +import type { + ExtractedPypiPackage, + ReachedPypiLabel, +} from './bazel-pypi-parser.mts' +import type { BazelQueryOptions } from './bazel-query-runner.mts' + +export type ExtractBazelToPypiOptions = { + bazelFlags: string | undefined + bazelOutputBase: string | undefined + bazelRc: string | undefined + bin: string | undefined + cwd: string + env?: NodeJS.ProcessEnv + out: string + outLayout?: 'flat' | 'standalone' + verbose: boolean + explicitEcosystem?: boolean +} + +export type ExtractBazelToPypiResult = { + artifactCount: number + manifestPath?: string | undefined + ok: boolean + noEcosystemFound?: boolean +} + +// Sort package lines deterministically (locale-aware, lowercase comparison). +function sortPackageLines( + lines: Array<{ name: string; version: string }>, +): Array<{ name: string; version: string }> { + return lines.sort((a, b) => { + const aLow = a.name.toLowerCase() + const bLow = b.name.toLowerCase() + if (aLow < bLow) { + return -1 + } + if (aLow > bLow) { + return 1 + } + return a.name.localeCompare(b.name) + }) +} + +export async function extractBazelToPypi( + opts: ExtractBazelToPypiOptions, +): Promise { + const { cwd, out, verbose } = opts + logger.group('bazel2pypi:') + logger.info(`- src dir: \`${cwd}\``) + logger.info(`- out dir: \`${out}\``) + if (!existsSync(cwd)) { + logger.warn(`Warning: cwd does not exist: ${cwd}`) + } + logger.groupEnd() + + try { + // Validate caller-provided Bazel filesystem settings before invoking Bazel. + if (opts.bazelOutputBase) { + validateOutputBase(opts.bazelOutputBase, opts.cwd) + } + // Python shim (for rules_python workspace discovery). + const shim = await provisionPythonShim() + const baseEnv = shim.augmentedEnv ?? opts.env + + // Step 1: workspace detection. + const mode = detectWorkspaceMode(cwd) + logger.info( + `Workspace mode: bzlmod=${mode.bzlmod} workspace=${mode.workspace}`, + ) + const invocationFlags = getBazelInvocationFlags(mode) + + // Step 2: bazel binary resolution. + const bin = await resolveBazelBinary(opts.bin) + logger.info(`Using bazel: ${bin}`) + if (verbose) { + logger.log('[VERBOSE] resolved options:', { + bin, + bazelRc: opts.bazelRc ?? '(unset)', + bazelOutputBase: opts.bazelOutputBase ?? '(unset)', + bazelFlags: opts.bazelFlags ?? '(unset)', + invocationFlags, + }) + } + + // Step 3: build the shared query options object. + const queryOpts: BazelQueryOptions = { + bin, + cwd, + invocationFlags, + ...(opts.bazelRc ? { bazelRc: opts.bazelRc } : {}), + ...(opts.bazelFlags ? { bazelFlags: opts.bazelFlags } : {}), + ...(opts.bazelOutputBase + ? { bazelOutputBase: opts.bazelOutputBase } + : {}), + ...(baseEnv ? { env: baseEnv } : {}), + verbose, + } + + // Step 4: discover validated PyPI hubs via the two-step recipe. + let bazelCommandCandidates: PypiHubCandidate[] | undefined + let nativeCandidates: string[] | undefined + if (mode.bzlmod) { + const extensionResult = await runBazelModShowPipExtension(queryOpts) + if (extensionResult.code === 0) { + bazelCommandCandidates = parseBazelModPipExtensionCandidates( + extensionResult.stdout, + verbose, + ) + } else if (verbose) { + logger.log( + '[VERBOSE] bazel mod show_extension failed; falling back to bounded static candidate parsing:', + extensionResult.stderr, + ) + } + + const visibleRepos = await runBazelModShowVisibleRepos(queryOpts) + if (visibleRepos.code === 0) { + nativeCandidates = parseVisibleRepoCandidates(visibleRepos.stdout) + if (verbose) { + logger.log( + '[VERBOSE] Bzlmod visible repo candidates:', + nativeCandidates, + ) + } + } else if (verbose) { + logger.log( + '[VERBOSE] bazel mod show_repo failed; falling back to static candidate parsing:', + visibleRepos.stderr, + ) + } + } + const probe = buildPypiProbeFor(queryOpts) + const hubs = await discoverPypiHubs( + cwd, + probe, + nativeCandidates, + verbose, + bazelCommandCandidates, + ) + const hubNames = Array.from(hubs.keys()) + logger.info( + `Discovered ${hubs.size} PyPI hub(s): ${hubNames.join(', ') || '(none)'}`, + ) + + if (!hubs.size) { + if (verbose) { + logger.info( + 'No PyPI hubs discovered. failureCategory=no-supported-ecosystem', + ) + } + return { + artifactCount: 0, + ok: false, + noEcosystemFound: true, + } + } + + // Step 5: for each hub, resolve the requirements lockfile (fast path), + // run the reached-closure query, and collect name==version pairs. + const allLines: Array<{ name: string; version: string; source: string }> = + [] + const warnings: string[] = [] + for (const [hubName, hubInfo] of hubs) { + // eslint-disable-next-line no-await-in-loop + const lockfileMap = await resolveHubLockfile(hubInfo, cwd, verbose) + // eslint-disable-next-line no-await-in-loop + const reached = await queryReachedPypiLabels(hubName, queryOpts, verbose) + const labelsToQuery = lockfileMap + ? reached.filter(label => !lockfileMap.has(label.normalizedName)) + : reached + const divergenceLabels = lockfileMap && verbose ? reached : labelsToQuery + // eslint-disable-next-line no-await-in-loop + const spokeTagLookup = await buildSpokeTagLookup( + divergenceLabels, + queryOpts, + verbose, + ) + + // Check for lockfile-vs-spoke-tag divergence and log warnings. + if (lockfileMap) { + for (const label of reached) { + const lockEntry = lockfileMap.get(label.normalizedName) + const spokeEntry = spokeTagLookup?.get(label.normalizedName) + if ( + lockEntry && + spokeEntry && + lockEntry.version !== spokeEntry.version + ) { + warnings.push( + `Version divergence for ${label.originalLabel}: lockfile says ${lockEntry.version}, spoke tag says ${spokeEntry.version}. Using lockfile.`, + ) + } + } + } + + const lines = collectPypiPackages(reached, lockfileMap, spokeTagLookup) + for (const l of lines) { + allLines.push({ name: l.name, version: l.version, source: l.source }) + } + logger.info(`@${hubName}: ${lines.length} package(s)`) + } + + // Step 6: cross-hub conflict check (same normalized name, different + // version across multiple hubs). + const crossHubVersions = new Map() + for (const l of allLines) { + const normalized = normalizePypiName(l.name) + const existing = crossHubVersions.get(normalized) + if (existing && existing !== l.version) { + throw new Error( + `Conflicting versions for ${l.name}: ${existing} vs ${l.version} across hubs.`, + ) + } + crossHubVersions.set(normalized, l.version) + } + + // Step 7: sort and write requirements.txt. + const sorted = sortPackageLines(allLines) + const lines = sorted.map(p => `${p.name}==${p.version}\n`) + const layout = opts.outLayout ?? 'standalone' + const manifestDir = + layout === 'flat' ? path.join(out, '.socket-auto-manifest') : out + mkdirSync(manifestDir, { recursive: true }) + const manifestPath = path.join(manifestDir, 'requirements.txt') + await fs.writeFile(manifestPath, lines.join(''), 'utf8') + + if (verbose) { + logger.log('[VERBOSE] outputs:', { + artifactCount: allLines.length, + generatedManifest: path.relative(out, manifestPath), + layout, + manifest: manifestPath, + pypiHubs: hubNames, + tool: 'socket manifest bazel', + workspace: { bzlmod: mode.bzlmod, legacyWorkspace: mode.workspace }, + }) + } + + for (const w of warnings) { + logger.warn(w) + } + + if (!allLines.length) { + logger.fail( + 'No PyPI packages extracted. failureCategory=ecosystem-detected-but-empty. See warnings above.', + ) + return { artifactCount: 0, manifestPath, ok: false } + } + logger.success( + `Wrote ${allLines.length} package(s) to ${path.relative(cwd, manifestPath)}.`, + ) + return { + artifactCount: allLines.length, + manifestPath, + ok: true, + } + } catch (e) { + logger.fail(`Unexpected error in bazel2pypi: ${getErrorCause(e)}`) + if (verbose) { + logger.group('[VERBOSE] error:') + logger.log(e) + logger.groupEnd() + } else { + logger.info('Re-run with --verbose for the full stack.') + } + return { artifactCount: 0, ok: false } + } +} + +// Resolve lockfile path and read/parse if within bounds. +async function resolveHubLockfile( + hubInfo: { + requirementsLockLabel?: string | undefined + requirementsLockPath?: string | undefined + }, + cwd: string, + verbose: boolean, +): Promise | undefined> { + const resolved = + hubInfo.requirementsLockPath ?? + resolveRequirementsLockPath(hubInfo.requirementsLockLabel, cwd) + if (verbose) { + logger.log( + '[VERBOSE] lockfile resolved:', + resolved ?? '(none from label/path)', + ) + } + const result = readRequirementsLockFile(resolved) + if (verbose && result) { + logger.log('[VERBOSE] lockfile parsed:', result.size, 'package(s)') + } + return result +} + +// Run the reached-closure query for Python targets and filter to hub labels. +async function queryReachedPypiLabels( + hubName: string, + queryOpts: BazelQueryOptions, + verbose: boolean, +): Promise { + const queryStr = 'deps(kind("py_library|py_binary|py_test", //...))' + const result = await runBazelQuery(queryStr, queryOpts, 'label') + if (result.code !== 0) { + if (verbose) { + logger.log( + `[VERBOSE] reached query failed for ${hubName}:`, + result.stderr, + ) + } + return [] + } + return filterReachedPypiPackages(result.stdout, hubName) +} + +// Build a spoke-tag lookup map for reached labels that don't have lockfile +// entries. For each reached label, if the lockfile missed it, resolve the +// actual target via `--output=build` and extract pypi_name/pypi_version. +async function buildSpokeTagLookup( + reached: ReachedPypiLabel[], + queryOpts: BazelQueryOptions, + verbose: boolean, +): Promise> { + const lookup = new Map() + for (const label of reached) { + // Only query the spoke if we haven't already resolved it. + if (lookup.has(label.normalizedName)) { + continue + } + // eslint-disable-next-line no-await-in-loop + const buildResult = await runBazelQuery(`${label.apparentLabel}`, { + ...queryOpts, + verbose: false, + }) + if (buildResult.code !== 0) { + if (verbose) { + logger.log( + `[VERBOSE] spoke build query failed for ${label.apparentLabel}:`, + buildResult.stderr, + ) + } + continue + } + let parsed = parsePypiTagsFromBuildOutput(buildResult.stdout) + if (!parsed) { + const actualLabel = parseAliasActualFromBuildOutput(buildResult.stdout) + if (actualLabel && actualLabel !== label.apparentLabel) { + // eslint-disable-next-line no-await-in-loop + const actualResult = await runBazelQuery(actualLabel, { + ...queryOpts, + verbose: false, + }) + if (actualResult.code === 0) { + parsed = parsePypiTagsFromBuildOutput(actualResult.stdout) + } else if (verbose) { + logger.log( + `[VERBOSE] spoke actual query failed for ${actualLabel}:`, + actualResult.stderr, + ) + } + } + } + if (parsed) { + lookup.set(normalizePypiName(parsed.name), parsed) + } + } + return lookup +} diff --git a/src/commands/manifest/bazel/extract_bazel_to_pypi.test.mts b/src/commands/manifest/bazel/extract_bazel_to_pypi.test.mts new file mode 100644 index 000000000..652d4eb40 --- /dev/null +++ b/src/commands/manifest/bazel/extract_bazel_to_pypi.test.mts @@ -0,0 +1,587 @@ +import { + existsSync, + mkdtempSync, + readFileSync, + rmSync, + writeFileSync, +} from 'node:fs' +import os from 'node:os' +import path from 'node:path' + +import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest' + +// Mock the helpers BEFORE importing the orchestrator. +vi.mock('./bazel-workspace-detect.mts', () => ({ + detectWorkspaceMode: vi.fn(), + getBazelInvocationFlags: vi.fn(() => []), +})) +vi.mock('./bazel-bin-detect.mts', () => ({ + resolveBazelBinary: vi.fn(async () => '/usr/local/bin/bazel'), +})) +vi.mock('./bazel-pypi-discovery.mts', () => ({ + discoverPypiHubs: vi.fn(), + parseBazelModPipExtensionCandidates: vi.fn(() => [ + { + hubName: 'pypi', + pythonVersion: '3.12', + requirementsLockLabel: '//:requirements_lock.txt', + source: 'bazel-mod-show-extension', + workspaceMode: 'bzlmod', + }, + ]), +})) +const { probe } = vi.hoisted(() => ({ + probe: async () => ({ code: 0, stdout: '@pypi//requests:pkg\n' }), +})) +vi.mock('./bazel-query-runner.mts', () => ({ + buildPypiProbeFor: vi.fn(() => probe), + buildProbeFor: vi.fn(() => probe), + runBazelModShowVisibleRepos: vi.fn(async () => ({ + code: 0, + stderr: '', + stdout: '', + })), + runBazelModShowPipExtension: vi.fn(async () => ({ + code: 0, + stderr: '', + stdout: + 'pip.parse(hub_name="pypi", python_version="3.12", requirements_lock="//:requirements_lock.txt")\nuse_repo(pip, "pypi")\n', + })), + runBazelQuery: vi.fn(), +})) +vi.mock('./bazel-output-base-check.mts', () => ({ + validateOutputBase: vi.fn(), +})) +vi.mock('./bazel-python-shim.mts', () => ({ + provisionPythonShim: vi.fn(async () => ({ + augmentedEnv: undefined, + shimDir: undefined, + })), +})) + +import { validateOutputBase } from './bazel-output-base-check.mts' +import { + discoverPypiHubs, + parseBazelModPipExtensionCandidates, +} from './bazel-pypi-discovery.mts' +import { + runBazelModShowPipExtension, + runBazelQuery, +} from './bazel-query-runner.mts' +import { detectWorkspaceMode } from './bazel-workspace-detect.mts' +import { extractBazelToPypi } from './extract_bazel_to_pypi.mts' + +import type { ExtractBazelToPypiResult } from './extract_bazel_to_pypi.mts' + +describe('extractBazelToPypi', () => { + let tmp: string + + beforeEach(() => { + tmp = mkdtempSync(path.join(os.tmpdir(), 'bazel-extract-')) + vi.mocked(detectWorkspaceMode).mockReturnValue({ + bzlmod: true, + workspace: false, + }) + vi.mocked(parseBazelModPipExtensionCandidates).mockReturnValue([ + { + hubName: 'pypi', + pythonVersion: '3.12', + requirementsLockLabel: '//:requirements_lock.txt', + source: 'bazel-mod-show-extension', + workspaceMode: 'bzlmod', + }, + ]) + process.exitCode = 0 + }) + + afterEach(() => { + rmSync(tmp, { recursive: true, force: true }) + vi.resetAllMocks() + process.exitCode = 0 + }) + + it('writes requirements.txt with sorted name==version lines', async () => { + vi.mocked(discoverPypiHubs).mockResolvedValue( + new Map([ + [ + 'pypi', + { + hubName: 'pypi', + source: 'MODULE.bazel', + workspaceMode: 'bzlmod', + requirementsLockLabel: '//:requirements_lock.txt', + probeStdout: '@pypi//requests:pkg\n@pypi//numpy:pkg', + }, + ], + ]), + ) + vi.mocked(runBazelQuery) + .mockResolvedValueOnce({ + code: 0, + stdout: '@pypi//requests:pkg\n@pypi//numpy:pkg', + stderr: '', + }) + .mockResolvedValueOnce({ + code: 0, + stdout: 'pypi_name=numpy\npypi_version=2.4.4', + stderr: '', + }) + .mockResolvedValueOnce({ + code: 0, + stdout: 'pypi_name=requests\npypi_version=2.33.1', + stderr: '', + }) + + // Create a requirements_lock.txt in the temp dir. + const lockPath = path.join(tmp, 'requirements_lock.txt') + writeFileSync(lockPath, 'requests==2.33.1\n', 'utf8') + + const result = await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: tmp, + out: tmp, + verbose: false, + }) + + expect(result).toEqual({ + artifactCount: expect.any(Number), + manifestPath: path.join(tmp, 'requirements.txt'), + ok: true, + }) + + const content = readFileSync(path.join(tmp, 'requirements.txt'), 'utf8') + expect(content).toContain('requests==2.33.1') + }) + + it('writes to .socket-auto-manifest/requirements.txt when outLayout is flat', async () => { + vi.mocked(discoverPypiHubs).mockResolvedValue( + new Map([ + [ + 'pypi', + { + hubName: 'pypi', + source: 'MODULE.bazel', + workspaceMode: 'bzlmod', + requirementsLockLabel: '//:requirements_lock.txt', + probeStdout: '@pypi//requests:pkg', + }, + ], + ]), + ) + vi.mocked(runBazelQuery) + .mockResolvedValueOnce({ + code: 0, + stdout: '@pypi//requests:pkg', + stderr: '', + }) + .mockResolvedValueOnce({ + code: 0, + stdout: 'pypi_name=requests\npypi_version=2.33.1', + stderr: '', + }) + + writeFileSync( + path.join(tmp, 'requirements_lock.txt'), + 'requests==2.33.1\n', + 'utf8', + ) + + const result = await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: tmp, + out: tmp, + outLayout: 'flat', + verbose: false, + }) + + expect(result.manifestPath).toBe( + path.join(tmp, '.socket-auto-manifest', 'requirements.txt'), + ) + expect( + existsSync(path.join(tmp, '.socket-auto-manifest', 'requirements.txt')), + ).toBe(true) + expect(existsSync(path.join(tmp, 'requirements.txt'))).toBe(false) + }) + + it('returns noEcosystemFound when no hubs and explicitEcosystem=true', async () => { + vi.mocked(discoverPypiHubs).mockResolvedValue(new Map()) + + const result = await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: tmp, + out: tmp, + verbose: false, + explicitEcosystem: true, + }) + + expect(result).toEqual({ + artifactCount: 0, + ok: false, + noEcosystemFound: true, + }) + }) + + it('returns noEcosystemFound when no hubs in auto mode', async () => { + vi.mocked(discoverPypiHubs).mockResolvedValue(new Map()) + + const result = await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: tmp, + out: tmp, + verbose: false, + }) + + expect(result).toEqual({ + artifactCount: 0, + ok: false, + noEcosystemFound: true, + }) + }) + + it('handles lockfile-vs-spoke divergence by preferring lockfile', async () => { + vi.mocked(discoverPypiHubs).mockResolvedValue( + new Map([ + [ + 'pypi', + { + hubName: 'pypi', + source: 'MODULE.bazel', + workspaceMode: 'bzlmod', + requirementsLockLabel: '//:requirements_lock.txt', + probeStdout: '@pypi//requests:pkg', + }, + ], + ]), + ) + vi.mocked(runBazelQuery) + .mockResolvedValueOnce({ + code: 0, + stdout: '@pypi//requests:pkg', + stderr: '', + }) + .mockResolvedValueOnce({ + code: 0, + stdout: 'pypi_name=requests\npypi_version=3.0.0', + stderr: '', + }) + + writeFileSync( + path.join(tmp, 'requirements_lock.txt'), + 'requests==2.33.1\n', + 'utf8', + ) + + const result = await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: tmp, + out: tmp, + verbose: false, + }) + + expect(result.ok).toBe(true) + const content = readFileSync(result.manifestPath!, 'utf8') + expect(content).toContain('requests==2.33.1') + expect(content).not.toContain('requests==3.0.0') + }) + + it('handles duplicate normalized names with same version', async () => { + vi.mocked(discoverPypiHubs).mockResolvedValue( + new Map([ + [ + 'pypi', + { + hubName: 'pypi', + source: 'MODULE.bazel', + workspaceMode: 'bzlmod', + requirementsLockLabel: '//:requirements_lock.txt', + probeStdout: + '@pypi//charset_normalizer:pkg\n@pypi//charset-normalizer:pkg', + }, + ], + ]), + ) + vi.mocked(runBazelQuery) + .mockResolvedValueOnce({ + code: 0, + stdout: '@pypi//charset_normalizer:pkg\n@pypi//charset-normalizer:pkg', + stderr: '', + }) + .mockResolvedValueOnce({ + code: 0, + stdout: 'pypi_name=charset-normalizer\npypi_version=3.4.7', + stderr: '', + }) + + writeFileSync( + path.join(tmp, 'requirements_lock.txt'), + 'charset-normalizer==3.4.7\n', + 'utf8', + ) + + const result = await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: tmp, + out: tmp, + verbose: false, + }) + + expect(result.ok).toBe(true) + const content = readFileSync(result.manifestPath!, 'utf8') + // Should only appear once (deduped). + const matches = content.match(/charset-normalizer==3\.4\.7/g) + expect(matches?.length).toBe(1) + }) + + it('returns failure without mutating process.exitCode when conflicting versions exist', async () => { + vi.mocked(discoverPypiHubs).mockResolvedValue( + new Map([ + [ + 'pypi', + { + hubName: 'pypi', + source: 'MODULE.bazel', + workspaceMode: 'bzlmod', + requirementsLockLabel: '//:requirements_lock.txt', + probeStdout: '@pypi//requests:pkg', + }, + ], + [ + 'other', + { + hubName: 'other', + source: 'MODULE.bazel', + workspaceMode: 'bzlmod', + probeStdout: '@other//requests:pkg', + }, + ], + ]), + ) + vi.mocked(runBazelQuery) + .mockResolvedValueOnce({ + code: 0, + stdout: '@pypi//requests:pkg', + stderr: '', + }) + .mockResolvedValueOnce({ + code: 0, + stdout: '@other//requests:pkg', + stderr: '', + }) + .mockResolvedValueOnce({ + code: 0, + stdout: 'pypi_name=requests\npypi_version=3.0.0', + stderr: '', + }) + + writeFileSync( + path.join(tmp, 'requirements_lock.txt'), + 'requests==2.33.1\n', + 'utf8', + ) + + const result = await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: tmp, + out: tmp, + verbose: false, + }) + + expect(process.exitCode).toBe(0) + expect(result.ok).toBe(false) + }) + + it('returns failure when a lockfile has conflicting normalized entries', async () => { + vi.mocked(discoverPypiHubs).mockResolvedValue( + new Map([ + [ + 'pypi', + { + hubName: 'pypi', + source: 'MODULE.bazel', + workspaceMode: 'bzlmod', + requirementsLockLabel: '//:requirements_lock.txt', + probeStdout: '@pypi//foo_bar:pkg', + }, + ], + ]), + ) + + writeFileSync( + path.join(tmp, 'requirements_lock.txt'), + 'foo-bar==1.0.0\nFoo_Bar==2.0.0\n', + 'utf8', + ) + + const result = await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: tmp, + out: tmp, + verbose: false, + }) + + expect(process.exitCode).toBe(0) + expect(result.ok).toBe(false) + expect(runBazelQuery).not.toHaveBeenCalled() + }) + + it('does not query spoke tags for packages resolved by the lockfile', async () => { + vi.mocked(discoverPypiHubs).mockResolvedValue( + new Map([ + [ + 'pypi', + { + hubName: 'pypi', + source: 'MODULE.bazel', + workspaceMode: 'bzlmod', + requirementsLockLabel: '//:requirements_lock.txt', + probeStdout: '@pypi//requests:pkg', + }, + ], + ]), + ) + vi.mocked(runBazelQuery).mockResolvedValueOnce({ + code: 0, + stdout: '@pypi//requests:pkg', + stderr: '', + }) + + writeFileSync( + path.join(tmp, 'requirements_lock.txt'), + 'requests==2.33.1\n', + 'utf8', + ) + + const result = await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: tmp, + out: tmp, + verbose: false, + }) + + expect(result.ok).toBe(true) + expect(runBazelQuery).toHaveBeenCalledTimes(1) + }) + + it('resolves hub aliases to spoke targets before parsing PyPI metadata', async () => { + vi.mocked(discoverPypiHubs).mockResolvedValue( + new Map([ + [ + 'pypi', + { + hubName: 'pypi', + source: 'MODULE.bazel', + workspaceMode: 'bzlmod', + probeStdout: '@pypi//requests:pkg', + }, + ], + ]), + ) + vi.mocked(runBazelQuery) + .mockResolvedValueOnce({ + code: 0, + stdout: '@pypi//requests:pkg', + stderr: '', + }) + .mockResolvedValueOnce({ + code: 0, + stdout: 'alias(name = "pkg", actual = "@pypi_requests//:pkg")', + stderr: '', + }) + .mockResolvedValueOnce({ + code: 0, + stdout: 'pypi_name=requests\npypi_version=2.33.1', + stderr: '', + }) + + const result = await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: tmp, + out: tmp, + verbose: false, + }) + + expect(result.ok).toBe(true) + expect(readFileSync(result.manifestPath!, 'utf8')).toBe( + 'requests==2.33.1\n', + ) + expect(runBazelQuery).toHaveBeenLastCalledWith( + '@pypi_requests//:pkg', + expect.any(Object), + ) + }) + + it('calls validateOutputBase when bazelOutputBase is set', async () => { + vi.mocked(discoverPypiHubs).mockResolvedValue(new Map()) + await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: tmp, + bazelRc: undefined, + bin: undefined, + cwd: tmp, + out: tmp, + verbose: false, + }) + expect(vi.mocked(validateOutputBase)).toHaveBeenCalledWith(tmp, tmp) + }) + + it('passes bazel mod show_extension candidates into discovery first', async () => { + vi.mocked(discoverPypiHubs).mockResolvedValue(new Map()) + + await extractBazelToPypi({ + bazelFlags: undefined, + bazelOutputBase: undefined, + bazelRc: undefined, + bin: undefined, + cwd: tmp, + out: tmp, + verbose: false, + }) + + expect(runBazelModShowPipExtension).toHaveBeenCalled() + expect(discoverPypiHubs).toHaveBeenCalledWith( + tmp, + expect.any(Function), + [], + false, + [ + { + hubName: 'pypi', + pythonVersion: '3.12', + requirementsLockLabel: '//:requirements_lock.txt', + source: 'bazel-mod-show-extension', + workspaceMode: 'bzlmod', + }, + ], + ) + }) +}) diff --git a/src/commands/manifest/cmd-manifest.test.mts b/src/commands/manifest/cmd-manifest.test.mts index 2973eba1e..5f3504d00 100644 --- a/src/commands/manifest/cmd-manifest.test.mts +++ b/src/commands/manifest/cmd-manifest.test.mts @@ -24,7 +24,7 @@ describe('socket manifest', async () => { Commands auto Auto-detect build and attempt to generate manifest file - bazel [beta] Bazel JVM SBOM support \\u2014 generate manifest files (\`maven_install.json\`) for a Bazel/Maven project + bazel [beta] Bazel SBOM support \\u2014 generate manifest files for a Bazel project (Maven, PyPI) cdxgen Run cdxgen for SBOM generation conda [beta] Convert a Conda environment.yml file to a python requirements.txt gradle [beta] Use Gradle to generate a manifest file (\`pom.xml\`) for a Gradle/Java/Kotlin/etc project diff --git a/src/commands/manifest/generate_auto_manifest.mts b/src/commands/manifest/generate_auto_manifest.mts index 63df846bf..0722b4701 100644 --- a/src/commands/manifest/generate_auto_manifest.mts +++ b/src/commands/manifest/generate_auto_manifest.mts @@ -86,27 +86,30 @@ export async function generateAutoManifest({ if (!sockJson?.defaults?.manifest?.bazel?.disabled && detected.bazel) { const bazelConfig = sockJson?.defaults?.manifest?.bazel + logger.log( 'Detected a Bazel workspace, extracting Maven dependencies via bazel query...', ) - const bazelResult = await extractBazelToMaven({ + const mavenResult = await extractBazelToMaven({ bazelFlags: bazelConfig?.bazelFlags, bazelOutputBase: bazelConfig?.bazelOutputBase, bazelRc: bazelConfig?.bazelRc, bin: bazelConfig?.bazel ?? bazelConfig?.bin, cwd, - // Auto-manifest writes into a sibling directory instead of the repo root - // so scan discovery can pick it up without colliding with a checked-in - // rules_jvm_external lockfile or repo-root gitignore patterns. out: bazelConfig?.out ?? cwd, outLayout: 'flat', verbose: Boolean(bazelConfig?.verbose) || verbose, }) - if (!bazelResult.ok) { - throw new Error('Bazel auto-manifest generation failed') + + if (!mavenResult.ok && !mavenResult.noEcosystemFound) { + throw new Error( + 'Bazel auto-manifest generation failed for ecosystem(s): maven', + ) } - if (bazelResult.manifestPath) { - generatedFiles.push(bazelResult.manifestPath) + if (mavenResult.ok && mavenResult.manifestPath) { + generatedFiles.push(mavenResult.manifestPath) + } else if (mavenResult.noEcosystemFound) { + logger.info('No supported Bazel Maven ecosystem detected.') } } diff --git a/src/commands/manifest/generate_auto_manifest.test.mts b/src/commands/manifest/generate_auto_manifest.test.mts index 7f803b9fc..07c22f03b 100644 --- a/src/commands/manifest/generate_auto_manifest.test.mts +++ b/src/commands/manifest/generate_auto_manifest.test.mts @@ -127,7 +127,7 @@ describe('generateAutoManifest — bazel branch', () => { ) }) - it('returns generated Bazel sidecar manifests', async () => { + it('returns generated Bazel Maven sidecar manifest by default', async () => { const result = await generateAutoManifest({ cwd: '/tmp/repo', detected: { ...baseDetected, bazel: true, count: 1 }, @@ -140,12 +140,27 @@ describe('generateAutoManifest — bazel branch', () => { ]) }) - it('throws when Bazel extraction fails', async () => { + it('does not run PyPI by default when Maven has no discovery', async () => { vi.mocked(extractBazelToMaven).mockResolvedValueOnce({ artifactCount: 0, + noEcosystemFound: true, ok: false, }) + const result = await generateAutoManifest({ + cwd: '/tmp/repo', + detected: { ...baseDetected, bazel: true, count: 1 }, + outputKind: 'text', + verbose: false, + }) + expect(result.generatedFiles).toEqual([]) + }) + + it('throws when Maven hard-fails', async () => { + vi.mocked(extractBazelToMaven).mockResolvedValueOnce({ + artifactCount: 0, + ok: false, + }) await expect( generateAutoManifest({ cwd: '/tmp/repo', @@ -153,7 +168,25 @@ describe('generateAutoManifest — bazel branch', () => { outputKind: 'text', verbose: false, }), - ).rejects.toThrow('Bazel auto-manifest generation failed') + ).rejects.toThrow( + 'Bazel auto-manifest generation failed for ecosystem(s): maven', + ) + }) + + it('does NOT throw when Maven has no discovery', async () => { + vi.mocked(extractBazelToMaven).mockResolvedValueOnce({ + artifactCount: 0, + noEcosystemFound: true, + ok: false, + }) + const result = await generateAutoManifest({ + cwd: '/tmp/repo', + detected: { ...baseDetected, bazel: true, count: 1 }, + outputKind: 'text', + verbose: false, + }) + + expect(result.generatedFiles).toEqual([]) }) it('runs BOTH bazel and gradle branches when both are detected', async () => { diff --git a/test/fixtures/manifest-bazel/python-pypi/requirements.expected.txt b/test/fixtures/manifest-bazel/python-pypi/requirements.expected.txt new file mode 100644 index 000000000..efeb44434 --- /dev/null +++ b/test/fixtures/manifest-bazel/python-pypi/requirements.expected.txt @@ -0,0 +1,35 @@ +annotated-types==0.7.0 +anyio==4.13.0 +blinker==1.9.0 +certifi==2026.4.22 +charset-normalizer==3.4.7 +click==8.3.3 +flask==3.1.3 +h11==0.16.0 +httpcore==1.0.9 +httpx==0.28.1 +idna==3.13 +iniconfig==2.3.0 +itsdangerous==2.2.0 +jinja2==3.1.6 +markdown-it-py==4.0.0 +markupsafe==3.0.3 +mdurl==0.1.2 +numpy==2.4.4 +packaging==26.1 +pandas==2.3.3 +pluggy==1.6.0 +pydantic==2.13.3 +pydantic-core==2.46.3 +pygments==2.20.0 +pytest==8.4.2 +python-dateutil==2.9.0.post0 +pytz==2026.1.post1 +requests==2.33.1 +rich==13.9.4 +six==1.17.0 +typing-extensions==4.15.0 +typing-inspection==0.4.2 +tzdata==2026.1 +urllib3==2.6.3 +werkzeug==3.1.8