Basin

Bucket-native, multi-tenant Postgres alternative.
RAM-per-connection is ~47× cheaper than Postgres. Projects are S3 prefixes, not databases — operator cost is O(bytes active), not O(projects provisioned). One binary, pgwire on the front, Vortex-compressed columnar files on any S3-compatible bucket on the back.

Pre-alpha — public eval. Basin is being built in the open. Use it today to evaluate cost economics, prototype multi-tenant patterns, or contribute. The hot-tier UPDATE/DELETE fast paths are on by default as of Phase 5.14 closure (bed431c); the C2 acceptance gate (Single-row UPDATE p50 <5ms) is met at 10k (0.3ms) and 100k (2.1ms), and the same shape lands at ~9ms at 1M — within 7× of PG instead of the pre-fix 1550×. See the per-shape table in RESULTS_localfs.md and the kill-switch notes in ADR 0016 if you need to roll the fastpaths back without a redeploy (BASIN_HOTTIER_FASTPATH_DISABLE=1).

See docs/V0_1_SCOPE.md for the v0.1 cut-off — required items, what's shipped, and what's parked.

Why Basin

Two things are structurally true regardless of workload, and they're the load-bearing reason Basin exists:

Pure-Rust async server, ~298 KiB RAM per held-open connection vs Postgres's ~7.7 MiB. That's ~26× per the LocalFS lifecycle card and ~47× when you count the long-tail noise floor. Under a 1,000-connection flood, Basin holds 1,000 / refuses 0; Postgres holds 99 / refuses 901. This isn't a tuning result — it's the difference between a from-scratch tokio server and a forking daemon. (server_lifecycle card)
A new project is an S3 bucket prefix, not a provisioned database. Idle projects cost only their bytes — measured at ~2 KiB of RAM and $0.10/month/project at typical SaaS-tail sizes. Spin up one project for a side app, or ten thousand for a SaaS — same architecture, same binary. (idle-project cost curve)

Everything else — Vortex columnar storage, native vector search, basin-auth + basin-rest, time travel via Iceberg snapshots, pg_cron/pg_net/pg_trgm/PostGIS subsets as native crates, real pgwire v3 with TLS + COPY + extended-query + native JSONB/UUID/BYTEA binding — exists to make those two structural wins addressable from real applications. Decision log lives in docs/decisions/.

The numbers (honest)

Basin publishes all of its head-to-head numbers, wins and losses, regenerated from integration tests on every push. Some shapes Basin beats Postgres; many it does not. The picture is workload-dependent — the table below is the 1M-row SaaS+OLAP suite, LocalFS, no index either side (the most apples-to-apples wedge). For 10k / 100k / real-S3 numbers, see the benchmark results.

Numbers below are the live 1M-row Vortex card, regenerated by benchmark/run/run_all.sh (Postgres-compare cards run with the HTAP fastpaths default-on + BASIN_FAST_BULK_INSERT=1, as disclosed on the dashboard). Full per-row table + the 10k/100k cards in RESULTS_localfs.md.

Workload (1M, no index either side)	Basin (Vortex)	Postgres 18	Verdict
Structural — wins
RAM per held-open connection	298 KiB	7,723 KiB	~26× less (47× at the long-tail floor)
Connections held under 1,000-conn flood	1,000 held / 0 refused	100 held / 900 refused	structural
On-disk bytes (users + events, 1M rows)	152 MB	306 MB	2.0× smaller (Vortex columnar)
Cold-start first query	3.9 ms	13.2 ms	3.4× faster
Analytics / read — wins
LATERAL JOIN (correlated derived table)	15 ms	3,307 ms	226× faster
Correlated subquery in SELECT p50	50 ms	7,289 ms	147× faster
2-table JOIN GROUP BY p50	37 ms	1,352 ms	36× faster
Analytics JOIN+WHERE p50	38 ms	851 ms	22× faster
COUNT(*) full table p50	85 ms	1,190 ms	14× faster
Composite range (created_at AND amount)	4.3 ms	45 ms	10.5× faster
ILIKE '%@gmail.com' p50	6.2 ms	44 ms	7.2× faster
COUNT(DISTINCT user_id) p50	48 ms	234 ms	4.9× faster
Aggregate GROUP BY user_id p50	53 ms	224 ms	4.2× faster
Range scan p50 (~1k rows)	17 ms	43 ms	2.5× faster
Pagination ORDER BY LIMIT/OFFSET p50	73 ms	117 ms	1.6× faster
DELETE WHERE id IN (10 rows)	0.40 ms	4.3 ms	10.8× faster (hot-tier tombstone fast path)
Selective single-row / OLTP — losses (PG's PK btree wins)
Point query p50 (unindexed PK)	0.098 ms	0.003 ms	~33× slower — both sub-ms; PG hits its PK btree
JSONB `->>` get-text p50 (steady, promoted)	1.9 ms	0.068 ms	~28× slower — ~2 ms after the filter-pushdown fix (was ~45 ms); PG's btree still wins selective reads
Large IN-list (~100 values)	1.1 ms	0.029 ms	~37× slower — both low-ms
Recursive CTE Fibonacci(30)	1.4 ms	0.06 ms	~22× slower (DataFusion upstream)
JSONB `@>` contains (no GIN prune)	1,086 ms	62 ms	~18× slower — `CREATE INDEX … USING gin` is accepted but probe→prune wiring is in flight (#105)
Write — losses
Single-row UPDATE p50	8.6 ms	0.71 ms	~12× slower (hot-tier overlay; was ~1550× before the fastpath)
Bulk UPDATE (~1/3 rows)	6.9 s	3.7 s	1.9× slower
Bulk INSERT 1,000,000 rows	50 s	9 s	~5.6× slower — WAL→Vortex flush pipeline, not write-amp
Concurrent SELECT (16 sessions, mixed)	174 ms	3.2 ms	~54× slower

At 1M rows: 38 wins / 40 losses. Basin's analytical wins compound — LATERAL JOIN 226×, correlated subquery 147×, 2-table JOIN 36×, analytics JOIN 22×, COUNT(*) 14× — and it now wins the aggregate (4.2×) and pagination (1.6×) shapes that earlier benchmark generations lost. The losses cluster on PG's home turf: selective single-row lookups (point query, JSONB ->>, IN-list) where PG's PK btree is sub-millisecond and Basin is low-single-digit-ms, and bulk/concurrent writes. The hot-tier fastpaths shipped, so single-row UPDATE is ~12× (down from ~1550×) and DELETE-WHERE-IN now wins — the write gaps are constant-ratio, not scaling-divergent. At 10k PG's heap fits in shared_buffers and wins most OLTP latency; at 100k Basin matches/beats on reads and pays the per-INSERT tax. On real S3 round-trip cost dominates the unindexed cold path; cached/warm paths stay competitive. Full per-scale matrix:

benchmark/RESULTS_localfs.md — 10k / 100k / 1M cards, Vortex + Parquet, every shape
benchmark/RESULTS_seaweedfs.md — local S3-compatible (SeaweedFS) results, same shape battery
benchmark/RESULTS_real.md — real-cloud S3 results
benchmark/BENCHMARKS.md — methodology, caveats, reproduction
Live dashboards: benchmark/index_localfs.html, benchmark/index_real.html

Think a benchmark is unfair? Tell us.

Every card is generated by an integration test in tests/integration/tests/ — read the source and tell us where we're wrong. File a Benchmark Methodology issue and we'll either fix the methodology, soften the headline claim, or explain why we think the test is fair. Decisions are logged in docs/decisions/.

Where Basin IS the answer

Append-heavy multi-tenant SaaS — many isolated projects, mostly-reads, occasional point UPDATE. The RAM-per-conn + per-project-prefix economics dominate; the OLTP point-write tax stays bounded as long as writes are append-shaped.
Audit logs, event streams, IoT, activity feeds — write-once-read-many, Vortex columnar compression shrinks bytes-at-rest 29× vs PG heap on the 1M SaaS shape and ~43× at 100k. Object-storage $/GB compounds.
AI / RAG with mixed tabular + vector data — native vector(N) + HNSW alongside transactional rows in the same database, no pg_vector install. Useful when "store the document + the embedding + the audit row in one place" is the requirement.
Cheap-idle multi-environment apps — dev / staging / per-region / per-customer all live as cheap project prefixes on one cluster. Per-project cost is O(bytes), so 10k mostly-idle projects stays cheap. See docs/multi-project.md.

Where Basin is NOT the answer (yet)

Drop-in OLTP replacement at 1M+ scale with sub-millisecond point-mutation latency requirements. Phase 5.14 closure (bed431c) flipped the hot-tier UPDATE/DELETE fast paths on by default; the per-shape gap dropped from 1550× (single-row UPDATE) to 7× at 1M, but PG's WHERE pk = ? UPDATE is still 1-2ms vs Basin's ~9ms at 1M because PG's index+heap is structurally faster on that one shape. For most multi-tenant SaaS workloads this gap doesn't matter (point mutations are <1% of traffic); for hot-write OLTP front ends it might. The kill-switch BASIN_HOTTIER_FASTPATH_DISABLE=1 rolls the fastpaths back without a redeploy if needed.
Index-heavy workloads — no btree on non-PK columns yet. WHERE non_pk_col = X does a full-file scan. Bloom filters on basin.sort_by and per-file column_stats prune at the file level, but a btree-class secondary index (Phase 5.7.B1, ~8 weeks) is the biggest single perf win still open.
Pure analytical workhorse — Snowflake / DuckDB / ClickHouse will out-run Basin on heavy GROUP BY / window / recursive-CTE shapes. Basin trades some bench wins for PG-compat, and inherits DataFusion's upstream limits on a handful of shapes (recursive CTE, exact COUNT(DISTINCT)).
High-frequency single-DB OLTP → Postgres / Aurora / Neon. Edge / local-first → Turso / libSQL. Geospatial primary store → PostGIS. Embedded SQLite-class → SQLite. Globally strongly-consistent multi-region writes → Spanner / Cockroach.

Batteries included

Basin is one binary that ships the pieces you'd otherwise wire up across five vendors. The wedge isn't "a cheaper Postgres" — it's "the same stack, but every project is a bucket prefix and the marginal cost is near zero."

Piece	Status	Honest caveat
Auth (signup, signin, magic-link, JWT, refresh, OAuth)	✅ Shipped	Per-project schema; `auth.uid()` / `auth.role()` work in RLS policies.
REST API (PostgREST-compatible)	✅ Shipped	`GET`/`POST`/`PATCH`/`DELETE` on `/rest/v1/<table>`; RPC mount for SQL + Wasm functions.
Realtime (SSE + WebSocket + presence)	✅ Shipped, harness-gated	Implementation complete; some integration harness slices `#[ignore]`-gated pending un-gate.
Blob storage (`basin-blob`)	🛠 v0.1 partial	Engine seam shipped; full SDK + signed-URL surface ongoing. See `CAPABILITIES.md`.
Vector search (native `vector(N)`, HNSW, `<->`/`<#>`/`<=>`)	✅ Shipped	Planner auto-routes `ORDER BY x <-> $1 LIMIT k`. No `pg_vector` install needed.
WASM UDFs (`CREATE FUNCTION … LANGUAGE wasm`)	🛠 v0.1 scalar	`i32`/`i64`/`f64` only today; string / JSONB support deferred to next phase. Suitable for math / validation UDFs, not bulk transforms.
Cron, HTTP-from-SQL, trgm, geo, continuous matviews	✅ via Basin-flavored crates	See "Postgres-extension equivalents". No `CREATE EXTENSION` required.

What we don't ship — and on purpose:

Edge functions (Cloudflare Workers / Supabase Edge Functions shape) — see ADR 0019: in-engine WASM UDFs plus declarative inbound webhooks + RPC mount solve the "compute close to data" need for Basin's wedge. Geographically distributed V8 isolates are a different concept and a maintenance burden the wedge doesn't justify.
Triggers / PL/pgSQL — replaced by declarative lifecycle columns + SQL-bodied reactors + LANGUAGE sql functions per ADR 0012.
Postgres extensions (.so) — see ADR 0002.

Long-form companion with code samples per piece: docs/batteries.md. Fine-grained matrix: CAPABILITIES.md.

This is the and-the-rest-of-the-stack-is-here line. The structural primitive (project = bucket prefix, 298 KiB/conn) is the wedge; this is what closes the sale.

Postgres compatibility

Surface	Status
sqllogictest (curated PG-style suite)	100% (50/50) as of `b7114e8`
ORM corpus (Drizzle / Prisma / sqlx / Diesel / TypeORM, 99 representative shapes)	95% (94/99) — Drizzle 100%, Prisma 90%, sqlx 95%, Diesel 95%, TypeORM 94%
Per-fragment SQL matrix (`docs/sql-support.md`, 697 fragments tested)	~91% Default config / ~94% non-excluded (629/667)
Wasm UDFs	`i32 / i64 / f64` shipped; `text / bytea / JSONB` in flight
Wire protocol	pgwire v3, simple + extended query, TLS (rustls), `COPY FROM STDIN / TO STDOUT`, prepared statements with binary parameters (native JSONB / UUID / BYTEA / NUMERIC varlena / ARRAY binary wire formats)
Differential PG-oracle harness	`tests/integration/tests/differential_pg.rs` — every release runs identical SQL against Basin and a real PostgreSQL; build fails on any cell-level divergence

Per-statement breakdown with every red row linked to its planner / parser / executor owner: docs/sql-support.md. Public capability matrix: CAPABILITIES.md.

Intentionally out of scope (per ADR 0002): LISTEN/NOTIFY, CREATE TRIGGER, CREATE OPERATOR, composite CREATE TYPE, multirange / OID / REGCLASS / BIT / PG_LSN.

Quickstart

Want to skip the build and start querying right now? See the 5-Minute Docker Quickstart — one docker run command, no Rust toolchain required.

Ready to go deeper? The Getting Started / Tutorial walks you through CRUD, auth, RLS policies, the REST API, a React/Vite frontend snippet, and the first-deployment path — about 15 minutes end-to-end.

Want to see a complete app? Two reference apps are in examples/:

examples/saas-starter/ — multi-tenant SaaS app: Drizzle ORM, basin-auth, RLS policies, basin-rest auto-generated REST surface.
examples/ai-rag-app/ — AI/RAG app: document chunking + embedding pipeline, basin-vector similarity retrieval, Wasm function calling an inference endpoint.

Install basin, point it at a data dir, run. No external object store is needed for local development.

BASIN_DATA_DIR=/tmp/basin cargo run -p basin-server

That gives you pgwire on 127.0.0.1:5433, durable WAL + Vortex columnar files (the catalog default; Parquet is opt-in per-table) under /tmp/basin/, and a volatile in-memory catalog for fast local iteration. Set BASIN_CATALOG=postgres://... for restart-safe metadata.

The full production-shaped boot layers WAL, shard owner, connection pool, JWT auth, and REST in one process:

BASIN_BIND=127.0.0.1:5433 \
BASIN_CATALOG=postgres://postgres@127.0.0.1:5432/postgres \
BASIN_DATA_DIR=/tmp/basin \
BASIN_WAL_DIR=/tmp/basin/wal \
BASIN_PROJECTS='alice=*,bob=*' \
BASIN_SHARD_ENABLED=1 \
BASIN_POOL_ENABLED=1 \
BASIN_AUTH_ENABLED=1 \
  BASIN_AUTH_JWT_SECRET=$(openssl rand -hex 32) \
  BASIN_AUTH_SMTP_HOST=smtp.example.com BASIN_AUTH_SMTP_PORT=587 \
  BASIN_AUTH_SMTP_USERNAME=u BASIN_AUTH_SMTP_PASSWORD=p \
  BASIN_AUTH_SMTP_FROM=noreply@example.com BASIN_AUTH_SMTP_TLS=starttls \
BASIN_REST_ENABLED=1 BASIN_REST_BIND=127.0.0.1:5434 \
cargo run -p basin-server

BASIN_PROJECTS is the project-list env var — name is historical, projects in the public API. Required vars for production-shaped durability: BASIN_BIND, BASIN_CATALOG=postgres://..., BASIN_DATA_DIR or BASIN_STORAGE_BACKEND, BASIN_WAL_DIR, BASIN_PROJECTS, and BASIN_AUTH_ENABLED (if you want auth). Everything else is optional.

To run the same binary against object storage, set BASIN_STORAGE_BACKEND=s3|tigris plus the S3-compatible endpoint, bucket, region, and credentials documented by basin-storage.

Connect with any Postgres driver:

psql -h 127.0.0.1 -p 5433 -U alice

Run real SQL:

-- Standard tables, standard SQL.
CREATE TABLE events (id BIGINT NOT NULL, body TEXT NOT NULL);
INSERT INTO events VALUES (1, 'hello'), (2, 'world');
SELECT * FROM events WHERE id = 2;

-- Native vector search, no pg_vector required.
CREATE TABLE docs (id BIGINT, embedding vector(384));
INSERT INTO docs VALUES (1, '[0.01, 0.02, ...]');
SELECT id FROM docs ORDER BY embedding <-> '[...]' LIMIT 10;

Confirm the data hit object storage under the project prefix:

find /tmp/basin/projects -name '*.vortex'   # default format
# /tmp/basin/projects/01HABCD…/tables/events/data/2026/05/01/01HEFG….vortex
# Tables created with WITH (basin.file_format='parquet') write *.parquet instead.

That's a real bucket-native database. The prefix is the IAM boundary; one bucket policy revokes all access to a project's data even if every other layer is bypassed.

Architecture

Four layers, each with one job:

   pgwire clients  (any Postgres driver — psql, tokio-postgres, asyncpg, JDBC)
          │
          ▼
   Routers (stateless)        parses SQL, applies RLS, routes by project
          │
          ▼
   Shard owners (stateful)    in-memory state for many projects per process,
          │                   eviction on idle, lazy load from WAL + Parquet
          ▼
   WAL                        durable append path; flushes to object storage
          │
          ▼
   Object storage + catalog   /projects/{id}/... Vortex (default) or Parquet + Iceberg-style metadata
                              local FS, S3, Tigris (S3-compatible) — same binary, different bucket

The full architecture document is in docs/architecture.md. Every "no" we've recorded is in docs/decisions/.

Built on: Apache Arrow · Apache Iceberg (table format) · Vortex (default columnar format, LFAI incubation) · Apache Parquet (opt-in, interchange) · Apache DataFusion (SQL planner) · Tokio · pgwire-rs · openraft (single-process Raft WAL simulation today; cross-process distributed WAL is v0.2). Pure Rust, #![forbid(unsafe_code)] across every crate.

Basin's query engine is built on Apache DataFusion, the open-source SQL query engine from the Apache Software Foundation. Basin does not fork DataFusion — every query plan runs through upstream operators with Basin-shaped rules layered on top (RLS injection, project isolation, partition pruning).

What you can do today

Postgres-compatible SQL — pgwire v3, simple + extended query protocol, TLS (rustls), COPY FROM STDIN/COPY TO STDOUT (CSV). Works with psql, tokio-postgres, asyncpg, JDBC, Diesel, SeaORM, any Postgres ORM. A curated ORM/driver-compat suite plus a PG-oracle differential harness (differential_pg.rs) gate every release against a real PostgreSQL.
CRUD + DDL — CREATE TABLE (incl. CREATE TABLE AS … WITH NO DATA), multi-row INSERT, SELECT, UPDATE, DELETE (Iceberg copy-on-write today; hot-tier tombstone fast path env-gated in 87ef24b), ON CONFLICT DO NOTHING / DO UPDATE upsert, ALTER TABLE … CLUSTER BY (…) / SET BLOOM FILTERS ON / SET row_group_rows / SET cold_after / ENABLE ROW LEVEL SECURITY / CREATE POLICY, SHOW TABLES. Prepared statements with parameter bind (text + binary, including native JSONB / UUID and correct NUMERIC / ARRAY binary wire formats).
Honest enforcement, not silent no-ops — CREATE UNIQUE INDEX actually enforces uniqueness, VARCHAR(n)/CHAR(n) length is enforced, RLS WITH CHECK is enforced on write, TABLESAMPLE actually samples, advisory locks are real, and unsupported CREATE TRIGGER / MERGE honest-reject with a SQLSTATE instead of silently doing nothing. A wave of silent-corruption CRITICALs surfaced by the differential harness were fixed.
Expanded SQL surface — JSONPath (jsonb_path_query, @?, @@, jsonb_path_query_array); JSONB mutators (jsonb_set/insert/strip_nulls/pretty/typeof); json_build_object/json_build_array; INET/CIDR containment; regexp_match/matches/split_to_array/split_to_table, format, encode/decode; datetime age/to_char/to_date/date_bin; window IGNORE NULLS; SAVEPOINT / ROLLBACK TO; data-modifying CTEs; correlated + LATERAL joins (incl. CROSS JOIN LATERAL generate_series); bounded full-text search (tsvector/tsquery/@@); ordered-set aggregates (percentile_disc, mode() WITHIN GROUP); range/multirange arithmetic; real transaction semantics (deferred commits, ROLLBACK undo, SAVEPOINT stack, aborted state).
Time travel — Iceberg-style snapshots. Catalog::rollback_to_snapshot(project, table, snapshot_id) rewinds; Catalog::fork_table(project, src, dst) clones a table's metadata + snapshot history into a new sibling that diverges on next commit. Zero data copy until divergence.
Native vector search — vector(N) + <-> / <#> / <=> operators, HNSW per file segment. No pg_vector.
Postgres-extension equivalents — pg_cron (basin-cron), pg_net + http (basin-net), pg_trgm (basin-trgm), PostGIS subset (basin-geo), TimescaleDB continuous aggregates (basin-cv), pgcrypto + uuid-ossp UDFs.
Auth + REST in the OSS bundle — basin-auth (signup, JWT, refresh-token rotation, email-link login, per-project API keys) + basin-rest (PostgREST-shape CRUD, cursor pagination + NDJSON streaming, OpenAPI 3.0 schema generation at GET /rest/v1/_openapi.json). auth.uid(), auth.role(), auth.jwt() SQL session functions let you write Supabase-style RLS policies.
Per-project connection URLs — POST /admin/v1/projects returns postgres://<user>:<password>@host:5433/<db>. Password bcrypt-validated on every pgwire startup; mismatch → SQLSTATE 28P01. Rotate via POST /admin/v1/projects/{user}/rotate.
Durable catalog — Iceberg-style catalog backed by Postgres when BASIN_CATALOG=postgres://...; tables, snapshots, project credentials, and basin-auth's identity tables survive process restart.
Cheap retention — Vortex (default, ~1.95× smaller than ZSTD Parquet on audit-log) or Parquet, ~29× smaller than Postgres heap on the 1M SaaS+OLAP shape, ~43× at 100k; per-file catalog column_stats + per-file bloom filters on basin.sort_by columns skip footer fetches and file opens when the predicate prunes the file.
Analytical path — a single DataFusion engine with Vortex/Parquet projection + predicate pushdown, catalog-statistics file pruning, per-file blooms, and incremental continuous materialized views. Approximate-cardinality and approximate-quantile UDFs (APPROX_COUNT_DISTINCT, APPROX_PERCENTILE) sit alongside exact counterparts for dashboard workloads. Heavy scans use stateless pooled compute over shared object storage.
Multi-schema isolation (phase A) — SchemaName / QualifiedTableName types, a schema-aware in-memory and Postgres-backed catalog, a basin_schemas table, and CREATE/DROP SCHEMA + cross-schema queries with differential coverage. Phases B–E (full name resolution / search_path semantics / wider DDL) are still in progress — see Status.
Operations — connection pooling, per-project pgwire rate limiting (token-bucket via governor), cost-based query rejection (BASIN_QUERY_COST_LIMIT_ROWS), per-project counters (ops / bytes_read / bytes_written / errors / p99), OpenTelemetry traces wired through router → engine → shard → storage → WAL.

The full capability matrix (with what's planned and what's deferred): CAPABILITIES.md. The fine-grained per-syntax matrix derived from automated tests: docs/sql-support.md.

Status

Phase	Description	Status
0	Validate the wedge — customer interviews, design partners	open (the gate; engineering is mature enough to need customer signal next)
1	Storage substrate — Vortex (default) / Parquet on object_store, Iceberg-style catalog	shipped
2	WAL service — sub-5 ms write acks	v0.1 shipped (single-node; Raft is v0.2)
3	Shard owners — per-project state, eviction, compactor	v0.1 shipped (in-process; placement service is v0.2)
4	Routers + SQL — pgwire v3, extended query, TLS, COPY, native JSONB / UUID binding	shipped — real single-shard transaction semantics (deferred commits, `ROLLBACK` undo, `SAVEPOINT` stack, aborted state) landed; cross-shard 2PC remains v0.2 (ADR 0011)
4.5	PostgreSQL SQL-compatibility push — silent-corruption CRITICAL fixes, JSONPath / JSONB-mutating / INET-CIDR / regexp / datetime function families, correct NUMERIC + ARRAY binary wire formats, PG-oracle differential harness (`differential_pg.rs`)	shipped — Default config at ~91% / ~94% non-excluded (629/667); long-tail exotic-DDL parser gaps remain v0.2
5	Analytical path — single DataFusion engine, Vortex/Parquet pushdown + per-file bloom + catalog pruning, continuous pre-aggregation, `APPROX_COUNT_DISTINCT`/`APPROX_PERCENTILE` UDFs	v0.1 shipped
5.0a	Vortex storage format — ~1.95× smaller than ZSTD Parquet; `aggregate_full` ~15–40× via catalog-stats metadata path; per-file blooms flip `point_eq` from a loss to a win at every scale	shipped as the DEFAULT (ADR 0015), zero-regression vs Parquet baseline. Parquet first-class per-table via `WITH (basin.file_format='parquet')`. HTAP hot-tier (ADR 0016) is Phase 5.14.C — closes the residual OLTP point-read and the UPDATE/DELETE write floor.
5.14	Durable Basin moat — per-file catalog blooms (shipped), `APPROX_COUNT_DISTINCT` + `APPROX_PERCENTILE` UDFs (shipped), catalog-aware `WindowExec` sort-elision (shipped), HTAP hot tier on by default (`bed431c`) — DELETE + UPDATE fast paths default-ON, kill-switch `BASIN_HOTTIER_FASTPATH_DISABLE=1`, merge-on-read via TombstoneFilterExec + UpdateOverlayExec wired in both DataFusion (`HtapUnionTable::scan`) and fast_select paths, gate-matrix locked by 16 tests, C6 differential harness extended with Mode D fastpath-on, TxCommit WAL marker (ADR 0020 §6) emitted explicitly with backward-compat replay. The 3-month investment that is not subsumed by upstream Vortex / DataFusion improvements.	shipped
5.15	Unified docs platform — OSS-repo markdown with YAML frontmatter (spec), `basin-cloud` webapp consumes via `npm run dev:docs` build-time fetch	OSS side shipped (5.15.A/B/C, frontmatter spec + 24-doc migration + top-level index + CI gate); `basin-cloud` webapp side (5.15.E–I) deferred to that repo
5.5	Sharding axes — partitioning, compute sharding, tiered storage	shipped
5.6	RLS with `CREATE POLICY` (UNION / CTE coverage)	shipped
5.7	Caches + bloom + A4 catalog stats + B2 cluster-by + B3 row-group sizing	shipped; B1 secondary indexes is the biggest open perf win (~8 weeks)
5.8	`pg_cron` + `pg_net` SQL surfaces	shipped
5.9	Postgres-extension equivalents (basin-geo / -trgm / -cv, JSONB, UUID, pgcrypto)	shipped
5.10	Identity + REST (basin-auth, basin-rest, OpenAPI, pagination, streaming, API keys, refresh rotation, per-project connection URLs, `auth.uid()` / `auth.role()` / `auth.jwt()` session functions)	shipped
5.11	Multi-schema isolation	phase A shipped — `SchemaName`/`QualifiedTableName` types, schema-aware in-memory + Postgres catalog, `basin_schemas` table, `CREATE/DROP SCHEMA` + cross-schema queries with differential coverage. Phases B–E in progress — full qualified-name resolution, `search_path` semantics, wider schema-scoped DDL
6	Production hardening	partial — telemetry / pooling / rate-limit / cost-rejection / catalog-PITR / fork shipped; multi-region (ADR 0009), catalog replication (ADR 0010), cross-shard 2PC (ADR 0011) all locked architecturally and gated on customer demand
6.x	SQL long-tail (still pending)	planned — `COPY FROM STDIN` ergonomics, server-side `PREPARE/EXECUTE` over text protocol edge cases, `LISTEN/NOTIFY`, plpgsql `DO` blocks, full `MERGE`, exotic types (multirange / `BIT` / `OID` / `REGCLASS`), and the parser-refused exotic DDL forms
7	Launch	gated on Phase 0

Six-month wedge slice: WEDGE.md. Full plan: TASK.md. Decision log: docs/decisions/.

How Basin compares

vs Postgres / Aurora / RDS

Postgres is the right answer for single-project, high-frequency OLTP and for workloads that need sub-millisecond point-mutation latency at 1M+ rows. Basin's hot-tier UPDATE/DELETE fast paths are on by default as of Phase 5.14 closure — point UPDATE at 1M lands at ~9ms (vs PG ~1.3ms), within 7× rather than the pre-fix 1550×, but PG's index+heap is still structurally faster on that one shape. Basin is not trying to be Postgres on those shapes. Where Basin wins: many-isolated-projects (per-environment / per-customer / per-region), append-shaped workloads where the columnar bytes-at-rest savings compound, the analytical shapes that grow Basin's lead with scale (122× faster correlated subquery at 1M, 61× LATERAL JOIN, 25× JOIN GROUP BY), and the structural RAM-per-connection economics for connection-heavy front ends.

vs Neon

Neon is serverless Postgres with branching — terrific for single-DB workloads that want copy-on-write forks. Basin matches the branching story (Iceberg forks are zero-copy too) but stores data on plain S3 rather than a managed page server. Neon's per-project minimum scales with project count (O(provisioned pool)); Basin's per-project cost is O(bytes), so many isolated projects stay cheap.

vs Supabase

Supabase is "BaaS in a box" — Postgres + Auth + Edge Functions + Storage + Realtime. Basin covers the SQL + Auth + REST surface in one binary, with auth.uid() / auth.role() / auth.jwt() working identically. Where Basin differs is the data-layer economics: Vortex/Parquet on S3 instead of Postgres heap on block storage. Multi-project SaaS that has outgrown Supabase's per-project pricing can migrate the database to Basin via pgwire and keep Supabase Auth / Edge Functions / Realtime for the parts of the stack they handle well. Edge Functions / Realtime / Storage are out of scope per ADRs 0005/0006.

vs Nile

Nile is "Postgres for multi-tenant SaaS" — same problem space as Basin, but built directly on real PostgreSQL with per-tenant virtual databases. That choice gives Nile real PG semantics, real OLTP, real JSONB, real extensions, and real PL/pgSQL for free, which is exactly where Basin still trails today. Basin's structural answer is the substrate economics: Nile's per-tenant cost is bounded by the underlying PG instance's per-tenant cost (heap pages, connection slots, autovacuum overhead); Basin's per-tenant cost is O(bytes-on-S3) with shared compute, so cold or low-traffic tenants stay near-zero. The right card to read is cost per tenant per month at p99 latency, at scale — single-instance PG-based multi-tenancy gets expensive at 10k+ tenants in a way object-storage-native multi-tenancy doesn't. If your workload is point-mutation-heavy and JSONB-heavy and you have <1k tenants, Nile is probably the easier answer today. If you have many idle or low-traffic tenants, append-shaped data, or want columnar bytes-at-rest economics, Basin is the cheaper substrate.

vs Turso / libSQL

Turso is the right answer for edge-distributed apps with many tiny SQLite-class databases. Basin is for centralized apps that want Postgres SQL on cheap object storage with a real wire-protocol surface that ORMs already speak.

vs ClickHouse / DuckDB / data-warehouse

ClickHouse and DuckDB are analytical engines — phenomenal at OLAP scans, not designed for transactional point reads or per-row inserts. Basin will lose to a dedicated warehouse on heavy GROUP BY / window / recursive-CTE shapes (see the perf table above). Basin's pitch is the unified path: one engine, one binary, pgwire on the wire, columnar substrate underneath — useful when "Snowflake plus a Postgres" is overkill for the workload size. The HTAP hot-tier (ADR 0016) closes the OLTP point-read and point-write floor on TB-scale tables — the missing piece that lets one engine span OLAP and OLTP without a second system.

Where Basin is not the answer

Per the ADRs:

Single-project high-frequency OLTP → Postgres / Aurora / Neon / Nile
Edge / local-first → Turso / libSQL / Cloudflare D1
Geospatial primary store → PostGIS
Embedding-only workload → dedicated vector DB (Qdrant, Pinecone) — but Basin's native vector(N) works fine alongside tabular data
Embedded SQLite-class library → SQLite
Globally strongly-consistent writes across regions → Spanner / CockroachDB
Heavy OLAP workhorse → Snowflake / ClickHouse / DuckDB

References

benchmark/BENCHMARKS.md — methodology, caveats, reproduction steps
benchmark/RESULTS_localfs.md — 10k / 100k / 1M SaaS+OLAP cards (Vortex + Parquet), every shape
benchmark/RESULTS_seaweedfs.md — local S3-compatible (SeaweedFS) battery
benchmark/RESULTS_real.md — real-cloud S3
benchmark/index_localfs.html / benchmark/index_real.html — live dashboards (no server needed)
CAPABILITIES.md — public capability matrix
docs/sql-support.md — per-syntax SQL matrix (auto-generated)
docs/decisions/ — ADRs (every "no" with the trigger that would change our mind)
docs/architecture.md — the four-layer stack in detail

Ecosystem

Basin (this repo) is the data plane. Three sibling repos sit around it:

bas-in/basin-cloud — control plane and dashboard (Go + Vite/JSX SPA, Apache-2.0). Manages orgs, projects, billing; runs Basin engines on Fly Machines per project. Operators who want a managed UI use it. Operators running a single self-hosted engine do not — basin-server alone is sufficient.
bas-in/basin-cli — operator daily-driver (Go, Apache-2.0, stdlib-only). basin login, basin projects list, basin sql run, release artefacts are Sigstore-signed. Talks to basin-cloud's /v1/* API.
bas-in/basin-js — TypeScript SDK (MIT). Supabase-shaped createClient(url, anonKey) that talks directly to a Basin engine (pgwire + REST), not through basin-cloud. Browser, Node, Deno, Bun, Cloudflare Workers. jsr:@bas-in/basin-js and npm:@bas-in/basin-js.
Planned client SDKs — basin-py, basin-rs, basin-go, basin-dart, basin-swift, basin-kotlin. All will follow the same engine-direct shape as basin-js.

Licensing rationale. Server-side projects (basin engine, basin-cloud, basin-cli) are Apache-2.0 to carry the patent grant operators expect from infrastructure. Client SDKs (basin-js and future siblings) are MIT to match the norm of the SDK ecosystems they sit in.

Project layout

crates/
  basin-common      shared types, errors, telemetry
  basin-storage     Vortex (default) / Parquet + object_store under project prefixes
  basin-catalog     Iceberg-style catalog (in-memory + Postgres-backed durable)
  basin-wal         file-backed WAL (Raft-backed in v0.2)
  basin-shard       in-process shard owner with WAL → Vortex/Parquet compactor
  basin-engine      single DataFusion engine — point reads + analytical pool, per-project sessions
  basin-router      pgwire v3 (simple + extended query)
  basin-vector      native HNSW vector search
  basin-hottier     in-memory hot tier (read-merge + tombstone fast path; on-by-default rollout in 5.14.C5/C6)
  basin-placement   (Phase 3 v0.2) (project, partition) → owner mapping
services/
  basin-server      single-process binary
benchmark/          dashboard + auto-regenerated RESULTS_*.md
docs/
  architecture.md   the four-layer stack, in detail
  multi-project.md  the multi-project SaaS story (per-project isolation, scheduler, cost math)
  decisions/        ADRs — every "no" with the trigger that would change our mind
  sql-compatibility.md  hand-written compatibility narrative (planner / catalog scope)
  sql-support.md    auto-generated per-syntax matrix (sql_support_matrix.rs)
tests/integration/  cross-crate viability + scaling + Postgres comparisons

Build and test

# Workspace build:
cargo build --workspace
cargo test  --workspace

# Run the benchmark suite + regenerate dashboard / RESULTS_localfs.md:
cargo test -p basin-integration-tests --tests -- --nocapture
python3 benchmark/bundle.py

# Then open the dashboard (no server required):
open benchmark/index_localfs.html

License

Apache-2.0 — see LICENSE.

Contributions welcome. The project is opinionated about scope (docs/decisions/) — open an issue before writing a PR that adds new surface area. The OSS code is the database; commercial cloud orchestration lives in a separate private repo and never affects what OSS users get.

Keywords for search

Basin is a Postgres-compatible, bucket-native, multi-tenant database on object storage, with Vortex (default columnar, LFAI) and Apache Parquet (opt-in, interchange) storage, an Apache Iceberg catalog, a file-backed WAL with a Raft WAL simulation toward distributed v0.2, native vector search (HNSW), per-file catalog bloom filters for point-query pruning, HTAP hot tier on the roadmap (ADR 0016), and pgwire protocol support that works with psql, tokio-postgres, asyncpg, JDBC, Diesel, SeaORM, and any other Postgres driver. Basin compares to Postgres, Neon, Supabase, Nile, Turso, PlanetScale, Aurora, ClickHouse, SingleStore, DuckDB, and CockroachDB for cheap-storage SaaS, audit-log, RAG / vector, HTAP, and multi-project use cases. Self-hostable, Apache-2.0 licensed, written in Rust.

Name		Name	Last commit message	Last commit date
Latest commit History 827 Commits
.cargo		.cargo
.github		.github
benchmark		benchmark
crates		crates
dev		dev
docs		docs
examples		examples
scripts		scripts
services		services
tests/integration		tests/integration
.basin-seaweedfs-s3.json		.basin-seaweedfs-s3.json
.basin-test.example.toml		.basin-test.example.toml
.basin-test.https.toml		.basin-test.https.toml
.basin-test.seaweedfs.toml		.basin-test.seaweedfs.toml
.basin-test.tigris-realistic.toml		.basin-test.tigris-realistic.toml
.basin-three-way.example.toml		.basin-three-way.example.toml
.dockerignore		.dockerignore
.gitignore		.gitignore
CAPABILITIES.md		CAPABILITIES.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
NEXT_SESSION.md		NEXT_SESSION.md
NEXT_WAVES.md		NEXT_WAVES.md
README.md		README.md
RELEASING.md		RELEASING.md
SECURITY.md		SECURITY.md
TASK.md		TASK.md
WEDGE.md		WEDGE.md
basin.png		basin.png
basin.svg		basin.svg
decisions.md		decisions.md
rust-toolchain.toml		rust-toolchain.toml
todo		todo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Basin

Why Basin

The numbers (honest)

Think a benchmark is unfair? Tell us.

Where Basin IS the answer

Where Basin is NOT the answer (yet)

Batteries included

Postgres compatibility

Quickstart

Architecture

What you can do today

Status

How Basin compares

vs Postgres / Aurora / RDS

vs Neon

vs Supabase

vs Nile

vs Turso / libSQL

vs ClickHouse / DuckDB / data-warehouse

Where Basin is not the answer

References

Ecosystem

Project layout

Build and test

License

Keywords for search

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Basin

Why Basin

The numbers (honest)

Think a benchmark is unfair? Tell us.

Where Basin IS the answer

Where Basin is NOT the answer (yet)

Batteries included

Postgres compatibility

Quickstart

Architecture

What you can do today

Status

How Basin compares

vs Postgres / Aurora / RDS

vs Neon

vs Supabase

vs Nile

vs Turso / libSQL

vs ClickHouse / DuckDB / data-warehouse

Where Basin is not the answer

References

Ecosystem

Project layout

Build and test

License

Keywords for search

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages