CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Context Optimization: This file is structured for efficient agent usage. The "Agent Routing" section defines what context each agent needs. When spawning subagents, pass only relevant sections—not the entire file. Sections marked  are lookup tables; don't include them in agent prompts unless specifically needed.

Project Overview

Activity is a Kubernetes extension that provides queryable audit logs, events, and human-readable activity summaries. It's built as an aggregated API server, making it work natively with kubectl and Kubernetes clients.

Module: go.miloapis.com/activity

Agent Routing

MANDATORY: All implementation work MUST be performed by subagents. Never directly edit code, configuration, or documentation in the parent conversation. Instead, always delegate to the appropriate specialized agent from the table below. The parent conversation should only coordinate agents, pass context between them, and communicate results to the user.

Do NOT ask the user which agent to use - pick the appropriate one based on what files or features are being modified.

Task Type	Agent	When to Use
UI/Frontend	`datum-platform:frontend-dev`	React, TypeScript, CSS, anything in `ui/` directory
Go Backend	`datum-platform:api-dev`	Go code in `cmd/`, `internal/`, `pkg/` directories
Infrastructure	`datum-platform:sre`	Kustomize, Dockerfile, CI/CD, `config/` directory, `.infra/` for deployment
Tests	`datum-platform:test-engineer`	Writing or fixing Go tests
Code Review	`datum-platform:code-reviewer`	After implementation, before committing
Documentation	`datum-platform:tech-writer`	README, docs/, guides, API documentation
Architecture	`Plan`	Designing new features or significant refactors
Exploration	`Explore`	Understanding codebase structure or finding code

Key principles:

Always use subagents — never write code, edit files, or run build/test commands directly in the parent conversation
Use agents proactively without being asked
For multi-step tasks, use the appropriate agent for each step (launch independent agents in parallel when possible)
After making code changes, always use code-reviewer to validate
For UI changes, run npm run build and npm run test:e2e to verify
Always test infrastructure changes in a test environment before opening a PR - Deploy to the test-infra KIND cluster (task test-infra:cluster-up) and verify resources work correctly before pushing changes to staging/production repos
Use Telepresence for debugging staging issues - When investigating bugs that only reproduce in staging, intercept the service and run it locally with task test-infra:telepresence:intercept SERVICE=<name>. See "Remote Debugging with Telepresence" section.

Agent Context Requirements

Each agent only needs specific context. When spawning agents, pass minimal relevant info in prompts—don't repeat the entire CLAUDE.md:

Agent	Required Context	Skip (don't include in prompt)
`frontend-dev`	UI commands, file paths in `ui/`	Go architecture, ClickHouse, NATS, data pipeline
`api-dev`	Go patterns, API resource types, key directories	UI commands, dev environment setup, migrations
`sre`	Config structure, build commands, deployment	Code architecture details, CEL patterns
`test-engineer`	Test commands, package being tested	Full architecture, deployment, UI
`Explore`	Key directories, architecture overview	Build commands, dev setup, deployment
`code-reviewer`	Architecture, multi-tenancy model, conventions	Dev environment, build commands
`tech-writer`	API resources, architecture overview	Implementation details, build commands

Agent Output Guidelines

Agents should return concise summaries to minimize context bloat in the parent conversation:

Agent	Return	Don't Return
`Explore`	File paths + 1-line descriptions	Full file contents, extensive code quotes
`api-dev`	What was changed + file paths	Full diffs, unchanged code
`frontend-dev`	Components modified + any build errors	Full file contents
`code-reviewer`	Numbered findings list with file:line refs	Full code blocks for context
`test-engineer`	Pass/fail summary + failure messages only	Full test output, passing test details
`sre`	Changed manifests + deployment notes	Full YAML contents

Multi-Step Task Decomposition

For complex tasks, decompose to minimize per-agent context:

Explore first (use model: "haiku"): Find relevant files → return only paths
Plan if needed: Design approach → return bullet points only
Implement (sonnet): Work on specific files identified in step 1
Review: Check only the changed files

Critical: Pass only what's needed between steps. Don't re-explore what's already known.

Build and Development Commands

All development tasks use Task. Run task --list to see all available commands.

Building

task build                    # Build the activity binary to bin/activity
task dev:build                # Build container image (ghcr.io/datum-cloud/activity:dev)

Testing

go test ./...                 # Run all Go tests
go test ./internal/cel/...    # Run tests in a specific package
go test -run TestName ./...   # Run a specific test

Code Generation

task generate                 # Run all code generation (OpenAPI, RBAC, migrations, docs)
task generate:openapi         # Generate Kubernetes OpenAPI definitions
task generate:rbac            # Generate RBAC manifests from kubebuilder annotations
task generate:docs            # Generate API reference documentation

UI Component Library

The UI is a React component library in /ui:

cd ui && npm install && npm run build    # Build the library
cd ui && npm run lint                    # Lint TypeScript
cd ui && npm run type-check              # Type check

Or via Task:

task ui:build                            # Build component library
task ui:example:dev                      # Run example app in dev mode

Architecture

Components

activity-apiserver: Kubernetes aggregated API server that handles queries and Watch streams
activity-processor: Processes audit logs/events through ActivityPolicy rules to generate Activities
activity-controller-manager: Manages ActivityPolicy lifecycle and status
kubectl-activity: CLI plugin for command-line querying
activity-ui: React component library for web interfaces

Data Pipeline

Audit logs/Events → Published to NATS JetStream by control plane
Vector → Receives from NATS, routes to ClickHouse and back to NATS for processing
activity-processor → Consumes audit events, applies ActivityPolicy rules, produces Activities
ClickHouse → Stores audit logs, events, and activities for long-term querying
etcd → Stores ActivityPolicy resources with Watch support

API Resources (`activity.miloapis.com/v1alpha1`)

Resource	Type	Purpose
AuditLogQuery	Ephemeral	Execute audit log searches
AuditLogFacetsQuery	Ephemeral	Get distinct values for autocomplete
Activity	Read-only	Query translated activity records
ActivityFacetQuery	Ephemeral	Get distinct activity field values
ActivityPolicy	Persistent	Define translation rules (CEL-based)
PolicyPreview	Ephemeral	Test policies against sample inputs
EventQuery	Ephemeral	Query cluster events
EventFacetQuery	Ephemeral	Get distinct event field values

Key Directories

cmd/activity/ - Main binary entrypoint (subcommands: apiserver, processor, controller-manager)
internal/apiserver/ - Aggregated API server implementation
internal/storage/ - ClickHouse storage backend
internal/cel/ - CEL expression engine for ActivityPolicy rules
internal/processor/ - Activity translation processor
internal/controller/ - Kubernetes controller for ActivityPolicy
internal/watch/ - Watch API implementation via NATS consumers
pkg/apis/activity/v1alpha1/ - API type definitions
pkg/mcp/ - MCP (Model Context Protocol) server implementation
config/ - Kustomize deployment manifests
migrations/ - ClickHouse schema migrations
.infra/ - Cloned infra repo for deployment configuration (see Infrastructure Management below)

Infrastructure Management

The .infra/ directory contains a clone of the datum-cloud/infra repository. Always use this folder for managing Activity's deployment infrastructure, including:

Flux Kustomizations
Environment-specific patches (staging/production)
Secret configurations
Dependencies and deployment ordering

Key paths in .infra/:

Path	Purpose
`.infra/apps/activity-system/base/`	Base Flux Kustomizations for all Activity components
`.infra/apps/activity-system/overlays/staging/`	Staging-specific patches and resources
`.infra/apps/activity-system/overlays/production/`	Production-specific patches
`.infra/clusters/staging/apps/activity-system.yaml`	Staging cluster entry point
`.infra/clusters/production/apps/activity-system.yaml`	Production cluster entry point

Workflow for infrastructure changes:

Make changes in .infra/apps/activity-system/
Commit and push to datum-cloud/infra repo (not this repo)
FluxCD will reconcile changes to the cluster

Important: The .infra/ folder is gitignored from this repo. Changes must be committed to the infra repo separately.

Development Environment

Lightweight Dev Setup (single-replica, minimal resources)

task dev:setup               # Full setup: cluster + dependencies + deploy
task dev:redeploy            # Quick rebuild and redeploy after code changes

Full Test Environment (HA with S3 storage)

task test:setup              # Full HA setup with 3-replica ClickHouse
task test:redeploy           # Quick rebuild and redeploy

Cluster Access

task test-infra:kubectl -- <args>        # Run kubectl against dev cluster
task test-infra:kubectl -- get pods -n activity-system
task test-infra:kubectl -- logs -l app=activity-apiserver -n activity-system -f

Database Migrations

task migrations:new NAME=description     # Create new migration file
task migrations:generate                 # Generate ConfigMap from migrations
task migrations:cluster:verify           # Verify schema in cluster

Important: After adding or modifying any SQL file in migrations/, always run task migrations:generate to regenerate the ConfigMap at config/components/clickhouse-migrations/configmap.yaml. The raw SQL files are not deployed directly — the ConfigMap is what gets deployed to the cluster.

Remote Debugging with Telepresence

When debugging issues in staging, use Telepresence to intercept a service and run it locally with full access to cluster resources (NATS, ClickHouse, etcd, milo).

# Install Telepresence CLI (one-time)
task test-infra:telepresence:install

# Connect to staging cluster
KUBECONFIG=~/.kube/gke-staging task test-infra:telepresence:connect

# Intercept a service to run locally
task test-infra:telepresence:intercept SERVICE=activity-apiserver NAMESPACE=activity-system PORT=6443

# Load environment variables from intercepted service
source /tmp/telepresence-activity-apiserver.env

# Run the service locally with debugger
go run ./cmd/activity apiserver --secure-port=6443

# When done, release the intercept
telepresence leave activity-apiserver
telepresence quit

Available services to intercept:

activity-apiserver (port 6443) - API server handling queries
activity-processor (port 8080) - Event processing pipeline
activity-controller-manager (port 8080) - ActivityPolicy controller

When to use Telepresence:

Debugging issues that only reproduce in staging with real data
Testing changes against production-like NATS streams and ClickHouse data
Investigating connectivity or configuration issues

Multi-Tenancy Model

Data is scoped by tenant:

Platform: All data across all tenants
Organization: Data within a specific organization
Project: Data within a specific project
User: Actions performed by a specific user

Scopes are NOT hierarchically inclusive - query each scope directly.

ActivityPolicy Translation

ActivityPolicy resources define how audit logs/events are translated into human-readable Activities using CEL expressions:

apiVersion: activity.miloapis.com/v1alpha1
kind: ActivityPolicy
spec:
  resource:
    apiGroup: networking.datumapis.com
    kind: HTTPProxy
  auditRules:
    - match: "audit.verb == 'create'"
      summary: "{{ actor }} created {{ link(kind + ' ' + audit.objectRef.name, audit.responseObject) }}"

Stable Codebase Facts

These locations rarely change—agents should use these directly without re-exploring:

What	Location	Notes
API type definitions	`pkg/apis/activity/v1alpha1/`	All CRD types live here
Storage implementations	`internal/storage/`	ClickHouse queries
CEL expression logic	`internal/cel/`	Policy matching engine
UI components	`ui/src/components/`	React component library
API server handlers	`internal/apiserver/`	REST handlers for each resource
Kustomize base	`config/base/`	Core deployment manifests
Kustomize components	`config/components/`	Optional features (ui, vector, etc.)
Database migrations	`migrations/`	ClickHouse schema files

Technology Stack

Go 1.25+ - Backend implementation
NATS JetStream - Durable event streaming
Vector - Data pipeline routing
ClickHouse - Analytics storage
etcd - Policy persistence
React/TypeScript - UI components

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Agent Routing

Agent Context Requirements

Agent Output Guidelines

Multi-Step Task Decomposition

Build and Development Commands

Building

Testing

Code Generation

UI Component Library

Architecture

Components

Data Pipeline

API Resources (`activity.miloapis.com/v1alpha1`)

Key Directories

Infrastructure Management

Development Environment

Lightweight Dev Setup (single-replica, minimal resources)

Full Test Environment (HA with S3 storage)

Cluster Access

Database Migrations

Remote Debugging with Telepresence

Multi-Tenancy Model

ActivityPolicy Translation

Stable Codebase Facts

Technology Stack

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Agent Routing

Agent Context Requirements

Agent Output Guidelines

Multi-Step Task Decomposition

Build and Development Commands

Building

Testing

Code Generation

UI Component Library

Architecture

Components

Data Pipeline

API Resources (activity.miloapis.com/v1alpha1)

Key Directories

Infrastructure Management

Development Environment

Lightweight Dev Setup (single-replica, minimal resources)

Full Test Environment (HA with S3 storage)

Cluster Access

Database Migrations

Remote Debugging with Telepresence

Multi-Tenancy Model

ActivityPolicy Translation

Stable Codebase Facts

Technology Stack

API Resources (`activity.miloapis.com/v1alpha1`)