Skip to content

Add system-probe-lite support for discovery in Helm chart#2479

Open
vitkyrka wants to merge 11 commits intomainfrom
vitkyrka/disco-lite
Open

Add system-probe-lite support for discovery in Helm chart#2479
vitkyrka wants to merge 11 commits intomainfrom
vitkyrka/disco-lite

Conversation

@vitkyrka
Copy link
Copy Markdown
Contributor

@vitkyrka vitkyrka commented Mar 13, 2026

What this PR does / why we need it:

Adds support for starting system-probe-lite (SPL) instead of the full system-probe in the Helm chart when only discovery is enabled.

SPL is a privileged Rust binary that implements just the discovery module. The goal of this support is to ensure that SPL is started instead of system-probe whenever only discovery is enabled.

How it works:

If any other system-probe features than discovery are also enabled (NPM, USM, etc.), the regular system-probe binary is always used directly

When only discovery is enabled (no other system-probe features):

  • The system-probe container automatically runs SPL as the entry point
  • If SPL fails to start (either because it doesn't exist in the image or potentially due to some other issue):
    • If discovery.enabled was explicitly set by the user, we fall back to the full system-probe binary (since the user opted in to discovery knowingly)
    • If only discovery.enabledByDefault is set (i.e. discovery was turned on without the user's explicit choice), we fall back to sleep infinity to avoid starting system-probe unexpectedly (or crashing) when using an older agent image without system-probe-lite (or without the discovery feature altogether).

Special notes for your reviewer:

The discovery-enabled helper uses kindIs "invalid" to distinguish between enabled: false (explicitly disabled) and enabled being unset/nil. This is important because a simple or would treat false and nil the same way, causing enabledByDefault: true to override an explicit enabled: false.

Note that discovery is not turned on by default in this PR, that's a separate, future step.

Checklist

[Place an '[x]' (no spaces) in all applicable fields. Please remove unrelated fields.]

  • All commits are signed (see: signing commits)
  • Chart Version semver bump label has been added (use <chartName>/minor-version, <chartName>/patch-version, or <chartName>/no-version-bump)
  • For datadog or datadog-operator chart or value changes, update the test baselines (run: make update-test-baselines)

GitHub CI takes care of the below, but are still required:

  • Documentation has been updated with helm-docs (run: .github/helm-docs.sh)
  • CHANGELOG.md has been updated
  • Variables are documented in the README.md

@github-actions github-actions bot added the chart/datadog This issue or pull request is related to the datadog chart label Mar 13, 2026
@vitkyrka vitkyrka added the datadog/minor-version Minor version bump for datadog chart label Mar 13, 2026
@vitkyrka vitkyrka force-pushed the vitkyrka/disco-lite branch 2 times, most recently from 20dc494 to 8ed1071 Compare April 6, 2026 14:44
When only discovery is enabled and no other system-probe feature is active,
use the lightweight system-probe-lite binary instead of full system-probe.
Falls back to system-probe if system-probe-lite is not available in the image.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vitkyrka vitkyrka force-pushed the vitkyrka/disco-lite branch from 8ed1071 to f296163 Compare April 6, 2026 15:15
@vitkyrka
Copy link
Copy Markdown
Contributor Author

vitkyrka commented Apr 6, 2026

@codex review

@vitkyrka vitkyrka marked this pull request as ready for review April 7, 2026 07:16
@vitkyrka vitkyrka requested review from a team as code owners April 7, 2026 07:16
@vitkyrka vitkyrka requested a review from a team April 7, 2026 07:17
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2961631fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread charts/datadog/templates/_helpers.tpl Outdated
@vitkyrka vitkyrka marked this pull request as draft April 7, 2026 07:57
vitkyrka and others added 4 commits April 7, 2026 08:12
…tainer

Use discovery-enabled helper in _container-system-probe.yaml instead of
directly checking datadog.discovery.enabled, so the cgroups volume mount
is also present when enabledByDefault=true.

Add a render test covering the cgroups mount for both cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vitkyrka
Copy link
Copy Markdown
Contributor Author

vitkyrka commented Apr 7, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@vitkyrka vitkyrka marked this pull request as ready for review April 7, 2026 09:21
Copy link
Copy Markdown
Contributor

@fanny-jiang fanny-jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but the changes as they are now are breaking in GKE Autopilot due to the restricted container, command value, being modified without a corresponding update to the Datadog GKE Autopilot WorkloadAllowlist.

https://docs.cloud.google.com/kubernetes-engine/docs/reference/crds/workloadallowlist#containers (see containers[].command)

Contributions to update the WorkloadAllowlist are welcome! We have internal docs on how updates can be submitted. Until then, the changes should be gated for both GKE Autopilot and GKE GDC. Thanks!

Comment thread test/datadog/baseline/manifests/gke_autopilot_usm.yaml Outdated
Comment thread test/datadog/gke_autopilot_workloadallowlist_test.go
Comment thread charts/datadog/templates/_helpers.tpl
vitkyrka and others added 2 commits April 8, 2026 07:12
The GKE Autopilot WorkloadAllowlist does not yet support system-probe-lite,
so gate should-use-system-probe-lite to fall back to regular system-probe
when providers.gke.autopilot or providers.gke.gdc is enabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-probe command

Add datadog.discovery.enabled to the "with system-probe" test case and add
a new "with discovery only" test case. Assert in
verifyAutopilotWorkloadAllowlistConstraints that the system-probe container
always uses the regular system-probe command on autopilot (not
system-probe-lite).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vitkyrka vitkyrka requested a review from fanny-jiang April 8, 2026 09:53
@wdhif wdhif self-requested a review April 8, 2026 14:30
Copy link
Copy Markdown
Member

@wdhif wdhif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking until we validate DataDog/datadog-operator#2610

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chart/datadog This issue or pull request is related to the datadog chart datadog/minor-version Minor version bump for datadog chart

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants