Skip to content

[CONTP-1503] Add standalone DatadogCSIDriver controller#2856

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 14 commits intomainfrom
tbavelier/csi-driver-standalone-v2
Apr 8, 2026
Merged

[CONTP-1503] Add standalone DatadogCSIDriver controller#2856
gh-worker-dd-mergequeue-cf854d[bot] merged 14 commits intomainfrom
tbavelier/csi-driver-standalone-v2

Conversation

@tbavelier
Copy link
Copy Markdown
Member

Summary

  • Adds DatadogCSIDriver CRD (v1alpha1) for declarative management of the Datadog CSI Driver via the operator, replacing the standalone Helm chart deployment
  • Implements a reconciler that manages a CSIDriver object (cluster-scoped, label-based ownership) and a DaemonSet (owner-ref based), with full drift reversion via Get+DeepEqual+Update
  • Supports overrides for pod template labels, annotations, tolerations, affinity, node selector, volumes, env vars, per-container resources, probes, and update strategy
  • Gated behind -datadogCSIDriverEnabled flag (default: false)

Commit walkthrough

  1. CRD types + generated manifestsDatadogCSIDriver, DatadogCSIDriverSpec, DatadogCSIDriverOverride, DatadogCSIDriverStatus types + CRD YAML
  2. Outer controller + RBAC — controller-runtime wiring with GenerationChangedPredicate on the CR, Owns(DaemonSet), and Watches(CSIDriver) via label-based enqueue
  3. Reconciler, builders, tests — reconciliation logic with deferred SSA status patch, finalizer-based CSIDriver cleanup, comprehensive unit tests
  4. Wire into operator — feature flag + scheme registration in cmd/main.go and setup.go

Test plan

  • Unit tests pass (go test ./internal/controller/datadogcsidriver/...)
  • Deploy operator with -datadogCSIDriverEnabled=true and verify CSIDriver + DaemonSet creation
  • Verify drift reversion: kubectl edit a DaemonSet label/spec field → next reconcile reverts it
  • Verify deletion cleanup: delete the DatadogCSIDriver CR → CSIDriver object is removed via finalizer
  • Verify overrides: apply tolerations, node selectors, env vars via spec.override

🤖 Generated with Claude Code

tbavelier and others added 4 commits April 2, 2026 14:03
Define the DatadogCSIDriver, DatadogCSIDriverSpec, DatadogCSIDriverOverride,
and DatadogCSIDriverStatus types in api/datadoghq/v1alpha1. The CRD enables
declarative management of the Datadog CSI Driver via the operator, replacing
the standalone Helm chart deployment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Controller-runtime wiring for the DatadogCSIDriver reconciler. Watches the
primary CR with GenerationChangedPredicate, owned DaemonSets for all changes
(including status), and CSIDriver objects via label-based enqueue for drift
detection on the cluster-scoped resource.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the reconciliation logic for the DatadogCSIDriver controller:
- Deferred SSA status patch with ObservedGeneration tracking
- CSIDriver object management with Get+DeepEqual+Update for full drift reversion
- DaemonSet management with the same pattern, including label enforcement
- Override system with merge-by-name semantics (env vars, volumes, mounts)
- Image resolution via pkg/images (supports tag-only overrides)
- Finalizer-based cleanup of the cluster-scoped CSIDriver on deletion
- Comprehensive unit tests covering creation, updates, drift, deletion, overrides

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Register the controller in setup.go with a feature flag and add the
-datadogCSIDriverEnabled flag (default: false) to cmd/main.go. Also
registers the storagev1 scheme required for CSIDriver object management.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tbavelier tbavelier requested a review from a team April 2, 2026 12:05
@tbavelier tbavelier requested a review from a team as a code owner April 2, 2026 12:05
@tbavelier tbavelier added this to the v1.26.0 milestone Apr 2, 2026
@tbavelier tbavelier added the enhancement New feature or request label Apr 2, 2026
- bases/v1/datadoghq.com_datadogdashboards.yaml
- bases/v1/datadoghq.com_datadoggenericresources.yaml
- bases/v1/datadoghq.com_datadogagentinternals.yaml
- bases/v1/datadoghq.com_datadogcsidrivers.yaml
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only change is adding csidriver CRD, otherwise, it's simply yaml formatting

@tbavelier tbavelier mentioned this pull request Apr 2, 2026
3 tasks
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fae3f3211d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/controller/datadogcsidriver/controller.go
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 65.71429% with 180 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.05%. Comparing base (60a4b9c) to head (9cd13e2).

Files with missing lines Patch % Lines
internal/controller/datadogcsidriver/daemonset.go 70.42% 87 Missing and 10 partials ⚠️
internal/controller/datadogcsidriver/controller.go 62.75% 38 Missing and 16 partials ⚠️
internal/controller/datadogcsidriver_controller.go 0.00% 21 Missing ⚠️
internal/controller/setup.go 44.44% 5 Missing ⚠️
cmd/main.go 33.33% 2 Missing ⚠️
...adogagent/component/clusterchecksrunner/default.go 0.00% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (65.71%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2856      +/-   ##
==========================================
+ Coverage   39.57%   40.05%   +0.48%     
==========================================
  Files         315      319       +4     
  Lines       27508    28031     +523     
==========================================
+ Hits        10885    11229     +344     
- Misses      15826    15979     +153     
- Partials      797      823      +26     
Flag Coverage Δ
unittests 40.05% <65.71%> (+0.48%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...nal/controller/datadogagent/feature/apm/feature.go 70.77% <100.00%> (ø)
internal/controller/datadogcsidriver/csidriver.go 100.00% <100.00%> (ø)
pkg/images/images.go 97.34% <ø> (ø)
...adogagent/component/clusterchecksrunner/default.go 10.69% <0.00%> (ø)
cmd/main.go 6.66% <33.33%> (+0.23%) ⬆️
internal/controller/setup.go 36.31% <44.44%> (+0.40%) ⬆️
internal/controller/datadogcsidriver_controller.go 0.00% <0.00%> (ø)
internal/controller/datadogcsidriver/controller.go 62.75% <62.75%> (ø)
internal/controller/datadogcsidriver/daemonset.go 70.42% <70.42%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 60a4b9c...9cd13e2. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread config/rbac/role.yaml Outdated
Comment thread internal/controller/datadogcsidriver/csidriver.go Outdated
Comment on lines +24 to +48
type DatadogCSIDriverSpec struct {
// CSIDriverImage is the image configuration for the main CSI node driver container.
// Default image: gcr.io/datadoghq/csi-driver:1.2.1
// +optional
CSIDriverImage *v2alpha1.AgentImageConfig `json:"csiDriverImage,omitempty"`

// RegistrarImage is the image configuration for the CSI node driver registrar sidecar.
// Default image: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.0.1
// +optional
RegistrarImage *v2alpha1.AgentImageConfig `json:"registrarImage,omitempty"`

// APMSocketPath is the host path to the APM socket.
// Default: /var/run/datadog/apm.socket
// +optional
APMSocketPath *string `json:"apmSocketPath,omitempty"`

// DSDSocketPath is the host path to the DogStatsD socket.
// Default: /var/run/datadog/dsd.socket
// +optional
DSDSocketPath *string `json:"dsdSocketPath,omitempty"`

// Override allows customization of the CSI driver DaemonSet pod template.
// +optional
Override *DatadogCSIDriverOverride `json:"override,omitempty"`
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

@tbavelier tbavelier Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, by "design", I did not include it in the CRD to have it default true, and users have the possibility of disabling it with an override should they desire: do you think it should be present in the CRD, do we expect users (tho we mostly expect people to rely on automatic creation from DDA ?) to play with it a lot ? Do we expect more fields in the APM section in the future / does it belong within a struct with other options we expect to introduce ?
Here's how it can be disabled at the moment (override takes precedence on any default/already set variable):

  override:
    containers:
      csi-node-driver:
        env:
          - name: DD_APM_ENABLED
            value: "false"

Comment thread internal/controller/datadogcsidriver/daemonset.go
}
}

func applyContainerOverrides(container *corev1.Container, override *v2alpha1.DatadogAgentGenericContainer) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ Not specifically related to CSI:

The container overrides applied here seem to be applicable to any container override in any other controller.

Is there a reason why this utility function lives here and not in some shared package to avoid reimplementing the same logic for all other controllers ?

Copy link
Copy Markdown
Member Author

@tbavelier tbavelier Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no: it's applicable to a controller using v2alpha1.DatadogAgentGenericContainer type with a fixed strategy: override always wins. In DatadogAgent controller, this is way more complex, as there's a callback to a strategy to have a merge be dependent on said strategy, and the signatures are different
image
I tried to extract outside of DatadogAgent to use in both paths but it simply adds indirection making it less readable. They could definitely be extracted if we start using more controllers with the same simple logic that is present in CSIDriver, re-using the same type. So letting as is, with comments: 9633847

Copy link
Copy Markdown
Contributor

@adel121 adel121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments / questions.

Also a general question, I don't see where the CSI Driver Custom Resource is being created by the Datadog Agent controller in case CSI feature is enabled. Or is this deferred to a follow-up PR?

@tbavelier
Copy link
Copy Markdown
Member Author

Left some comments / questions.

Also a general question, I don't see where the CSI Driver Custom Resource is being created by the Datadog Agent controller in case CSI feature is enabled. Or is this deferred to a follow-up PR?

Follow-up PR #2857 as shared offline, to discuss further

Copy link
Copy Markdown
Collaborator

@khewonc khewonc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small optional nits, but otherwise lgtm

Comment thread api/datadoghq/v1alpha1/datadogcsidriver_types.go
Comment thread internal/controller/datadogcsidriver/controller.go Outdated
Comment thread internal/controller/datadogcsidriver/const.go Outdated
Comment thread internal/controller/datadogcsidriver/controller.go Outdated
Comment thread internal/controller/datadogcsidriver/defaults.go Outdated
@tbavelier tbavelier requested a review from a team as a code owner April 8, 2026 11:52
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d bot merged commit 61e7c1b into main Apr 8, 2026
60 of 61 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d bot deleted the tbavelier/csi-driver-standalone-v2 branch April 8, 2026 14:29
@tbavelier tbavelier changed the title Add standalone DatadogCSIDriver controller [CONTP-1503] Add standalone DatadogCSIDriver controller Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants