Estuary is a real-time data platform with:
- Control plane: user-facing catalog management APIs
- Data planes: distributed runtime execution
- Connectors: OCI images integrating external systems
This repo lives at https://github.com/estuary/flow
Estuary is built with:
- Rust (primary language)
- Third-party sources under
~/.cargo/registry/src/
- Third-party sources under
- Go - integration glue with the Gazette consumer framework
- Third-party sources under
~/go/pkg/mod/
- Third-party sources under
- Protobuf - communication between control plane, data planes, and connectors
- Supabase - migrations are under
supabase/migrations/- pgTAP tests under
supabase/tests/
- pgTAP tests under
- Docs - external user-facing product documentation under
site/(Docusaurus)
Use regular cargo and go tools to build and test crates.
# libsqlite3 tag is required for `bindings` and `flowctl-go` packages.
go build -tags libsqlite3 ./go/bindings
# Regenerate checked-in protobuf (required after .proto changes)
mise run build:go-protobufs
mise run build:rust-protobufs
# Run pgTAP SQL Tests
mise run ci:sql-tap
# E2E tests over derivation examples (SLOW)
mise run ci:catalog-testA development Supabase instance is available:
# Reset with current migrations as needed
supabase db reset
# Interact directly with dev DB
psql postgresql://postgres:postgres@localhost:5432/postgres -c 'SELECT 1;'Users interact with the control plane to manage a catalog of:
- Captures: tasks which capture from a user endpoint into target collections
- Collections: collections of data with enforced JSON Schema
- Derivations: both a collection and a task - the task builds its collection through transformation of other collections
- Materializations: tasks which maintain materialized views of source collections in an endpoint
- Tests: fixtures of source collection inputs and expected derivation outputs
Collections and tasks have a declarative (JSON/YAML) model. Users refine model changes in drafts, which are published to the control plane for verification and testing. The control plane compiles the user's catalog model into built specs that have extra specifics required by the runtime, and activates specs into their associated data plane.
- Supabase: catalog and platform config DB
- Agent: APIs and background automation
- Data-plane controller: provisions data planes
- Gazette: brokers serve the journals that back collections
- Reactors: runtime written to Gazette consumer framework; executes tasks and runs connectors as sidecars over gRPC
- Etcd: config for gazette and reactors
go/protocols/flow/flow.proto- core types and built specsgo/protocols/capture/capture.proto- protocol for capture tasksgo/protocols/derive/derive.proto- for derivation tasksgo/protocols/materialize/materialize.proto- for materialization tasks
Every crate/module should have a README.md with essential context:
- Purpose and fit within the project
- Key types and entry points
- Brief architecture and non-obvious details
A README.md is ONLY a roadmap for expert developers, orienting them where to look next.
Keep READMEs current - update with code changes.
- Use
var myVar = ...in Go. Do NOT usemyVar := ...(unless required due to shadowing) - Write comments that document "why" - rationale, broader context, and non-obvious detail
- Do NOT write comments which describe the obvious behavior of code.
Don't write
// Get credentialsbefore a callgetCredentials() - Prefer functional approaches. Try to avoid mutation.
- Use early-return over nested conditionals
- Use at least one level of name qualification for third-party types and functions.
For example,
axum::Router::new()instead ofuse axum::Router; Router::new(). Types / functions should be unqualified ONLY if they're in the current module.
- Prefer snapshots over fine-grain assertions (
insta/cupaloy)
- Wrap errors with context (
anyhow::Context/fmt.Errorf) - Return errors up the stack rather than logging
- Panic on impossible states (do NOT add spurious error handling)
- Structured logging with context (
tracing/logrus) - Avoid verbose logging in hot paths