fix(pipeline): stop creating Route nodes from URLs in config files#646
Open
mvanhorn wants to merge 1 commit into
Open
fix(pipeline): stop creating Route nodes from URLs in config files#646mvanhorn wants to merge 1 commit into
mvanhorn wants to merge 1 commit into
Conversation
Infra Route extraction harvested any URL-like string literal from any YAML/TF/TOML file, so a repo of only config files produced spurious Route nodes (terraform registry URL, a JWKS discovery URL, an upstream host, and a healthcheck shell command). These inflated the Route set that get_architecture and cross-repo matching rely on. Restrict the loose string-ref harvesting to genuine Infrastructure-as-Code files (Terraform / HCL) and require a bare URL value, so generic config, dependabot, compose and k8s/kustomize manifests no longer emit Routes. Structured topic->endpoint bindings still flow through cbm_pipeline_process_infra_bindings(), so real infra endpoints are kept. Fixes DeusData#521 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QK73cX8EuqqwQEJUbycu6g Signed-off-by: mvanhorn <mvanhorn@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Indexing a repo of only config files produced spurious
Routenodes from arbitrary URL-like strings. The infra-route extractor (cbm_pipeline_extract_infra_routesinsrc/pipeline/pipeline.c) harvested anyCBM_STRREF_URLstring literal from any.yaml/.yml/.tf/.hcl/.tomlfile, regardless of whether the host file actually defines service routes.This restricts the loose URL string-ref harvesting to genuine Infrastructure-as-Code files (Terraform / HCL) and additionally requires the value to be a bare URL:
cbm_is_infra_route_source_file()— only.tf,.tf.json,.hclare route sources. Generic config (config.yaml), dependency manifests (dependabot.yaml), container orchestration (compose.yaml), and Kubernetes / Kustomize manifests are excluded.cbm_is_bare_endpoint_url()— rejects command strings that merely embed a URL (e.g. acurl ... || exit 1healthcheck), while still accepting query-string URLs.Structured topic→endpoint bindings still flow through
cbm_pipeline_process_infra_bindings(), so real infrastructure endpoints (Cloud Scheduler / Pub/Sub targets) continue to produce Route nodes.Why this matters: per #521, a three-file repro (
dependabot.yaml+config.yaml+compose.yaml) yielded four bogus routes — a Terraform registry URL, a JWKS discovery URL, an upstream service host, and a healthcheck shell command. None is a route the service serves; they inflate the Route set thatget_architectureand cross-repo route matching depend on, making downstream matching noisier.Checklist
git commit -s) — required, CI rejectsunsigned commits (DCO, see CONTRIBUTING.md)
make -f Makefile.cbm test)make -f Makefile.cbm lint-ci)Testing notes
Routenodes from Route nodes created from URL strings in config / non-source files #521; after the fix it produces zero. A Terraform.tfendpoint URL still produces an infra Route.tests/test_pipeline.c:infra_route_source_file_gate— Terraform/HCL accepted; dependabot/config/compose/k8s/kustomize/toml and.tfvarsrejected.infra_bare_endpoint_url_gate— bare URLs accepted; healthcheck/command strings rejected.Fixes #521