Skip to content

Latest commit

 

History

History
125 lines (94 loc) · 4.43 KB

File metadata and controls

125 lines (94 loc) · 4.43 KB

CLAUDE.md

Estuary is a real-time data platform with:

  • Control plane: user-facing catalog management APIs
  • Data planes: distributed runtime execution
  • Connectors: OCI images integrating external systems

This repo lives at https://github.com/estuary/flow

Repository Overview

Estuary is built with:

  • Rust (primary language)
    • Third-party sources under ~/.cargo/registry/src/
  • Go - integration glue with the Gazette consumer framework
    • Third-party sources under ~/go/pkg/mod/
  • Protobuf - communication between control plane, data planes, and connectors
  • Supabase - migrations are under supabase/migrations/
    • pgTAP tests under supabase/tests/
  • Docs - external user-facing product documentation under site/ (Docusaurus)

Essential Commands

Build & Test

Use regular cargo and go tools to build and test crates.

# libsqlite3 tag is required for `bindings` and `flowctl-go` packages.
go build -tags libsqlite3 ./go/bindings

# Regenerate checked-in protobuf (required after .proto changes)
mise run build:go-protobufs
mise run build:rust-protobufs

# Run pgTAP SQL Tests
mise run ci:sql-tap

# E2E tests over derivation examples (SLOW)
mise run ci:catalog-test

Development

A development Supabase instance is available:

# Reset with current migrations as needed
supabase db reset

# Interact directly with dev DB
psql postgresql://postgres:postgres@localhost:5432/postgres -c 'SELECT 1;'

Architecture Overview

Core Concepts

Users interact with the control plane to manage a catalog of:

  • Captures: tasks which capture from a user endpoint into target collections
  • Collections: collections of data with enforced JSON Schema
  • Derivations: both a collection and a task - the task builds its collection through transformation of other collections
  • Materializations: tasks which maintain materialized views of source collections in an endpoint
  • Tests: fixtures of source collection inputs and expected derivation outputs

Collections and tasks have a declarative (JSON/YAML) model. Users refine model changes in drafts, which are published to the control plane for verification and testing. The control plane compiles the user's catalog model into built specs that have extra specifics required by the runtime, and activates specs into their associated data plane.

Control-plane components

  • Supabase: catalog and platform config DB
  • Agent: APIs and background automation
  • Data-plane controller: provisions data planes

Data-plane components

  • Gazette: brokers serve the journals that back collections
  • Reactors: runtime written to Gazette consumer framework; executes tasks and runs connectors as sidecars over gRPC
  • Etcd: config for gazette and reactors

Protocols

  • go/protocols/flow/flow.proto - core types and built specs
  • go/protocols/capture/capture.proto - protocol for capture tasks
  • go/protocols/derive/derive.proto - for derivation tasks
  • go/protocols/materialize/materialize.proto - for materialization tasks

README.md

Every crate/module should have a README.md with essential context:

  • Purpose and fit within the project
  • Key types and entry points
  • Brief architecture and non-obvious details

A README.md is ONLY a roadmap for expert developers, orienting them where to look next.

Keep READMEs current - update with code changes.

Development Guidelines

Implementation

  • Use var myVar = ... in Go. Do NOT use myVar := ... (unless required due to shadowing)
  • Write comments that document "why" - rationale, broader context, and non-obvious detail
  • Do NOT write comments which describe the obvious behavior of code. Don't write // Get credentials before a call getCredentials()
  • Prefer functional approaches. Try to avoid mutation.
  • Use early-return over nested conditionals
  • Use at least one level of name qualification for third-party types and functions. For example, axum::Router::new() instead of use axum::Router; Router::new(). Types / functions should be unqualified ONLY if they're in the current module.

Testing

  • Prefer snapshots over fine-grain assertions (insta / cupaloy)

Errors

  • Wrap errors with context (anyhow::Context / fmt.Errorf)
  • Return errors up the stack rather than logging
  • Panic on impossible states (do NOT add spurious error handling)

Logging

  • Structured logging with context (tracing / logrus)
  • Avoid verbose logging in hot paths