Skip to content

bas-in/basin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

827 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Basin

Basin

Bucket-native, multi-tenant Postgres alternative.
RAM-per-connection is ~47× cheaper than Postgres. Projects are S3 prefixes, not databases — operator cost is O(bytes active), not O(projects provisioned). One binary, pgwire on the front, Vortex-compressed columnar files on any S3-compatible bucket on the back.

CI latest release changelog status: pre-alpha honest benchmarks vs Postgres: 47x less RAM/conn vs Postgres: 29x smaller on disk (1M SaaS) capabilities SQL support matrix license: Apache-2.0

Pre-alpha — public eval. Basin is being built in the open. Use it today to evaluate cost economics, prototype multi-tenant patterns, or contribute. The hot-tier UPDATE/DELETE fast paths are on by default as of Phase 5.14 closure (bed431c); the C2 acceptance gate (Single-row UPDATE p50 <5ms) is met at 10k (0.3ms) and 100k (2.1ms), and the same shape lands at ~9ms at 1M — within 7× of PG instead of the pre-fix 1550×. See the per-shape table in RESULTS_localfs.md and the kill-switch notes in ADR 0016 if you need to roll the fastpaths back without a redeploy (BASIN_HOTTIER_FASTPATH_DISABLE=1).

See docs/V0_1_SCOPE.md for the v0.1 cut-off — required items, what's shipped, and what's parked.


Why Basin

Two things are structurally true regardless of workload, and they're the load-bearing reason Basin exists:

  1. Pure-Rust async server, ~298 KiB RAM per held-open connection vs Postgres's ~7.7 MiB. That's ~26× per the LocalFS lifecycle card and ~47× when you count the long-tail noise floor. Under a 1,000-connection flood, Basin holds 1,000 / refuses 0; Postgres holds 99 / refuses 901. This isn't a tuning result — it's the difference between a from-scratch tokio server and a forking daemon. (server_lifecycle card)
  2. A new project is an S3 bucket prefix, not a provisioned database. Idle projects cost only their bytes — measured at ~2 KiB of RAM and $0.10/month/project at typical SaaS-tail sizes. Spin up one project for a side app, or ten thousand for a SaaS — same architecture, same binary. (idle-project cost curve)

Everything else — Vortex columnar storage, native vector search, basin-auth + basin-rest, time travel via Iceberg snapshots, pg_cron/pg_net/pg_trgm/PostGIS subsets as native crates, real pgwire v3 with TLS + COPY + extended-query + native JSONB/UUID/BYTEA binding — exists to make those two structural wins addressable from real applications. Decision log lives in docs/decisions/.


The numbers (honest)

Basin publishes all of its head-to-head numbers, wins and losses, regenerated from integration tests on every push. Some shapes Basin beats Postgres; many it does not. The picture is workload-dependent — the table below is the 1M-row SaaS+OLAP suite, LocalFS, no index either side (the most apples-to-apples wedge). For 10k / 100k / real-S3 numbers, see the benchmark results.

Numbers below are the live 1M-row Vortex card, regenerated by benchmark/run/run_all.sh (Postgres-compare cards run with the HTAP fastpaths default-on + BASIN_FAST_BULK_INSERT=1, as disclosed on the dashboard). Full per-row table + the 10k/100k cards in RESULTS_localfs.md.

Workload (1M, no index either side) Basin (Vortex) Postgres 18 Verdict
Structural — wins
RAM per held-open connection 298 KiB 7,723 KiB ~26× less (47× at the long-tail floor)
Connections held under 1,000-conn flood 1,000 held / 0 refused 100 held / 900 refused structural
On-disk bytes (users + events, 1M rows) 152 MB 306 MB 2.0× smaller (Vortex columnar)
Cold-start first query 3.9 ms 13.2 ms 3.4× faster
Analytics / read — wins
LATERAL JOIN (correlated derived table) 15 ms 3,307 ms 226× faster
Correlated subquery in SELECT p50 50 ms 7,289 ms 147× faster
2-table JOIN GROUP BY p50 37 ms 1,352 ms 36× faster
Analytics JOIN+WHERE p50 38 ms 851 ms 22× faster
COUNT(*) full table p50 85 ms 1,190 ms 14× faster
Composite range (created_at AND amount) 4.3 ms 45 ms 10.5× faster
ILIKE '%@gmail.com' p50 6.2 ms 44 ms 7.2× faster
COUNT(DISTINCT user_id) p50 48 ms 234 ms 4.9× faster
Aggregate GROUP BY user_id p50 53 ms 224 ms 4.2× faster
Range scan p50 (~1k rows) 17 ms 43 ms 2.5× faster
Pagination ORDER BY LIMIT/OFFSET p50 73 ms 117 ms 1.6× faster
DELETE WHERE id IN (10 rows) 0.40 ms 4.3 ms 10.8× faster (hot-tier tombstone fast path)
Selective single-row / OLTP — losses (PG's PK btree wins)
Point query p50 (unindexed PK) 0.098 ms 0.003 ms ~33× slower — both sub-ms; PG hits its PK btree
JSONB ->> get-text p50 (steady, promoted) 1.9 ms 0.068 ms ~28× slower — ~2 ms after the filter-pushdown fix (was ~45 ms); PG's btree still wins selective reads
Large IN-list (~100 values) 1.1 ms 0.029 ms ~37× slower — both low-ms
Recursive CTE Fibonacci(30) 1.4 ms 0.06 ms ~22× slower (DataFusion upstream)
JSONB @> contains (no GIN prune) 1,086 ms 62 ms ~18× slowerCREATE INDEX … USING gin is accepted but probe→prune wiring is in flight (#105)
Write — losses
Single-row UPDATE p50 8.6 ms 0.71 ms ~12× slower (hot-tier overlay; was ~1550× before the fastpath)
Bulk UPDATE (~1/3 rows) 6.9 s 3.7 s 1.9× slower
Bulk INSERT 1,000,000 rows 50 s 9 s ~5.6× slower — WAL→Vortex flush pipeline, not write-amp
Concurrent SELECT (16 sessions, mixed) 174 ms 3.2 ms ~54× slower

At 1M rows: 38 wins / 40 losses. Basin's analytical wins compound — LATERAL JOIN 226×, correlated subquery 147×, 2-table JOIN 36×, analytics JOIN 22×, COUNT(*) 14× — and it now wins the aggregate (4.2×) and pagination (1.6×) shapes that earlier benchmark generations lost. The losses cluster on PG's home turf: selective single-row lookups (point query, JSONB ->>, IN-list) where PG's PK btree is sub-millisecond and Basin is low-single-digit-ms, and bulk/concurrent writes. The hot-tier fastpaths shipped, so single-row UPDATE is ~12× (down from ~1550×) and DELETE-WHERE-IN now wins — the write gaps are constant-ratio, not scaling-divergent. At 10k PG's heap fits in shared_buffers and wins most OLTP latency; at 100k Basin matches/beats on reads and pays the per-INSERT tax. On real S3 round-trip cost dominates the unindexed cold path; cached/warm paths stay competitive. Full per-scale matrix:

Think a benchmark is unfair? Tell us.

Every card is generated by an integration test in tests/integration/tests/ — read the source and tell us where we're wrong. File a Benchmark Methodology issue and we'll either fix the methodology, soften the headline claim, or explain why we think the test is fair. Decisions are logged in docs/decisions/.


Where Basin IS the answer

  • Append-heavy multi-tenant SaaS — many isolated projects, mostly-reads, occasional point UPDATE. The RAM-per-conn + per-project-prefix economics dominate; the OLTP point-write tax stays bounded as long as writes are append-shaped.
  • Audit logs, event streams, IoT, activity feeds — write-once-read-many, Vortex columnar compression shrinks bytes-at-rest 29× vs PG heap on the 1M SaaS shape and ~43× at 100k. Object-storage $/GB compounds.
  • AI / RAG with mixed tabular + vector data — native vector(N) + HNSW alongside transactional rows in the same database, no pg_vector install. Useful when "store the document + the embedding + the audit row in one place" is the requirement.
  • Cheap-idle multi-environment apps — dev / staging / per-region / per-customer all live as cheap project prefixes on one cluster. Per-project cost is O(bytes), so 10k mostly-idle projects stays cheap. See docs/multi-project.md.

Where Basin is NOT the answer (yet)

  • Drop-in OLTP replacement at 1M+ scale with sub-millisecond point-mutation latency requirements. Phase 5.14 closure (bed431c) flipped the hot-tier UPDATE/DELETE fast paths on by default; the per-shape gap dropped from 1550× (single-row UPDATE) to 7× at 1M, but PG's WHERE pk = ? UPDATE is still 1-2ms vs Basin's ~9ms at 1M because PG's index+heap is structurally faster on that one shape. For most multi-tenant SaaS workloads this gap doesn't matter (point mutations are <1% of traffic); for hot-write OLTP front ends it might. The kill-switch BASIN_HOTTIER_FASTPATH_DISABLE=1 rolls the fastpaths back without a redeploy if needed.
  • Index-heavy workloads — no btree on non-PK columns yet. WHERE non_pk_col = X does a full-file scan. Bloom filters on basin.sort_by and per-file column_stats prune at the file level, but a btree-class secondary index (Phase 5.7.B1, ~8 weeks) is the biggest single perf win still open.
  • Pure analytical workhorse — Snowflake / DuckDB / ClickHouse will out-run Basin on heavy GROUP BY / window / recursive-CTE shapes. Basin trades some bench wins for PG-compat, and inherits DataFusion's upstream limits on a handful of shapes (recursive CTE, exact COUNT(DISTINCT)).
  • High-frequency single-DB OLTP → Postgres / Aurora / Neon. Edge / local-first → Turso / libSQL. Geospatial primary store → PostGIS. Embedded SQLite-class → SQLite. Globally strongly-consistent multi-region writes → Spanner / Cockroach.

Batteries included

Basin is one binary that ships the pieces you'd otherwise wire up across five vendors. The wedge isn't "a cheaper Postgres" — it's "the same stack, but every project is a bucket prefix and the marginal cost is near zero."

Piece Status Honest caveat
Auth (signup, signin, magic-link, JWT, refresh, OAuth) ✅ Shipped Per-project schema; auth.uid() / auth.role() work in RLS policies.
REST API (PostgREST-compatible) ✅ Shipped GET/POST/PATCH/DELETE on /rest/v1/<table>; RPC mount for SQL + Wasm functions.
Realtime (SSE + WebSocket + presence) ✅ Shipped, harness-gated Implementation complete; some integration harness slices #[ignore]-gated pending un-gate.
Blob storage (basin-blob) 🛠 v0.1 partial Engine seam shipped; full SDK + signed-URL surface ongoing. See CAPABILITIES.md.
Vector search (native vector(N), HNSW, <->/<#>/<=>) ✅ Shipped Planner auto-routes ORDER BY x <-> $1 LIMIT k. No pg_vector install needed.
WASM UDFs (CREATE FUNCTION … LANGUAGE wasm) 🛠 v0.1 scalar i32/i64/f64 only today; string / JSONB support deferred to next phase. Suitable for math / validation UDFs, not bulk transforms.
Cron, HTTP-from-SQL, trgm, geo, continuous matviews ✅ via Basin-flavored crates See "Postgres-extension equivalents". No CREATE EXTENSION required.

What we don't ship — and on purpose:

  • Edge functions (Cloudflare Workers / Supabase Edge Functions shape) — see ADR 0019: in-engine WASM UDFs plus declarative inbound webhooks + RPC mount solve the "compute close to data" need for Basin's wedge. Geographically distributed V8 isolates are a different concept and a maintenance burden the wedge doesn't justify.
  • Triggers / PL/pgSQL — replaced by declarative lifecycle columns + SQL-bodied reactors + LANGUAGE sql functions per ADR 0012.
  • Postgres extensions (.so) — see ADR 0002.

Long-form companion with code samples per piece: docs/batteries.md. Fine-grained matrix: CAPABILITIES.md.

This is the and-the-rest-of-the-stack-is-here line. The structural primitive (project = bucket prefix, 298 KiB/conn) is the wedge; this is what closes the sale.


Postgres compatibility

Surface Status
sqllogictest (curated PG-style suite) 100% (50/50) as of b7114e8
ORM corpus (Drizzle / Prisma / sqlx / Diesel / TypeORM, 99 representative shapes) 95% (94/99) — Drizzle 100%, Prisma 90%, sqlx 95%, Diesel 95%, TypeORM 94%
Per-fragment SQL matrix (docs/sql-support.md, 697 fragments tested) ~91% Default config / ~94% non-excluded (629/667)
Wasm UDFs i32 / i64 / f64 shipped; text / bytea / JSONB in flight
Wire protocol pgwire v3, simple + extended query, TLS (rustls), COPY FROM STDIN / TO STDOUT, prepared statements with binary parameters (native JSONB / UUID / BYTEA / NUMERIC varlena / ARRAY binary wire formats)
Differential PG-oracle harness tests/integration/tests/differential_pg.rs — every release runs identical SQL against Basin and a real PostgreSQL; build fails on any cell-level divergence

Per-statement breakdown with every red row linked to its planner / parser / executor owner: docs/sql-support.md. Public capability matrix: CAPABILITIES.md.

Intentionally out of scope (per ADR 0002): LISTEN/NOTIFY, CREATE TRIGGER, CREATE OPERATOR, composite CREATE TYPE, multirange / OID / REGCLASS / BIT / PG_LSN.


Quickstart

Want to skip the build and start querying right now? See the 5-Minute Docker Quickstart — one docker run command, no Rust toolchain required.

Ready to go deeper? The Getting Started / Tutorial walks you through CRUD, auth, RLS policies, the REST API, a React/Vite frontend snippet, and the first-deployment path — about 15 minutes end-to-end.

Want to see a complete app? Two reference apps are in examples/:

  • examples/saas-starter/ — multi-tenant SaaS app: Drizzle ORM, basin-auth, RLS policies, basin-rest auto-generated REST surface.
  • examples/ai-rag-app/ — AI/RAG app: document chunking + embedding pipeline, basin-vector similarity retrieval, Wasm function calling an inference endpoint.

Install basin, point it at a data dir, run. No external object store is needed for local development.

BASIN_DATA_DIR=/tmp/basin cargo run -p basin-server

That gives you pgwire on 127.0.0.1:5433, durable WAL + Vortex columnar files (the catalog default; Parquet is opt-in per-table) under /tmp/basin/, and a volatile in-memory catalog for fast local iteration. Set BASIN_CATALOG=postgres://... for restart-safe metadata.

The full production-shaped boot layers WAL, shard owner, connection pool, JWT auth, and REST in one process:

BASIN_BIND=127.0.0.1:5433 \
BASIN_CATALOG=postgres://postgres@127.0.0.1:5432/postgres \
BASIN_DATA_DIR=/tmp/basin \
BASIN_WAL_DIR=/tmp/basin/wal \
BASIN_PROJECTS='alice=*,bob=*' \
BASIN_SHARD_ENABLED=1 \
BASIN_POOL_ENABLED=1 \
BASIN_AUTH_ENABLED=1 \
  BASIN_AUTH_JWT_SECRET=$(openssl rand -hex 32) \
  BASIN_AUTH_SMTP_HOST=smtp.example.com BASIN_AUTH_SMTP_PORT=587 \
  BASIN_AUTH_SMTP_USERNAME=u BASIN_AUTH_SMTP_PASSWORD=p \
  BASIN_AUTH_SMTP_FROM=noreply@example.com BASIN_AUTH_SMTP_TLS=starttls \
BASIN_REST_ENABLED=1 BASIN_REST_BIND=127.0.0.1:5434 \
cargo run -p basin-server

BASIN_PROJECTS is the project-list env var — name is historical, projects in the public API. Required vars for production-shaped durability: BASIN_BIND, BASIN_CATALOG=postgres://..., BASIN_DATA_DIR or BASIN_STORAGE_BACKEND, BASIN_WAL_DIR, BASIN_PROJECTS, and BASIN_AUTH_ENABLED (if you want auth). Everything else is optional.

To run the same binary against object storage, set BASIN_STORAGE_BACKEND=s3|tigris plus the S3-compatible endpoint, bucket, region, and credentials documented by basin-storage.

Connect with any Postgres driver:

psql -h 127.0.0.1 -p 5433 -U alice

Run real SQL:

-- Standard tables, standard SQL.
CREATE TABLE events (id BIGINT NOT NULL, body TEXT NOT NULL);
INSERT INTO events VALUES (1, 'hello'), (2, 'world');
SELECT * FROM events WHERE id = 2;

-- Native vector search, no pg_vector required.
CREATE TABLE docs (id BIGINT, embedding vector(384));
INSERT INTO docs VALUES (1, '[0.01, 0.02, ...]');
SELECT id FROM docs ORDER BY embedding <-> '[...]' LIMIT 10;

Confirm the data hit object storage under the project prefix:

find /tmp/basin/projects -name '*.vortex'   # default format
# /tmp/basin/projects/01HABCD…/tables/events/data/2026/05/01/01HEFG….vortex
# Tables created with WITH (basin.file_format='parquet') write *.parquet instead.

That's a real bucket-native database. The prefix is the IAM boundary; one bucket policy revokes all access to a project's data even if every other layer is bypassed.


Architecture

Four layers, each with one job:

   pgwire clients  (any Postgres driver — psql, tokio-postgres, asyncpg, JDBC)
          │
          ▼
   Routers (stateless)        parses SQL, applies RLS, routes by project
          │
          ▼
   Shard owners (stateful)    in-memory state for many projects per process,
          │                   eviction on idle, lazy load from WAL + Parquet
          ▼
   WAL                        durable append path; flushes to object storage
          │
          ▼
   Object storage + catalog   /projects/{id}/... Vortex (default) or Parquet + Iceberg-style metadata
                              local FS, S3, Tigris (S3-compatible) — same binary, different bucket

The full architecture document is in docs/architecture.md. Every "no" we've recorded is in docs/decisions/.

Built on: Apache Arrow · Apache Iceberg (table format) · Vortex (default columnar format, LFAI incubation) · Apache Parquet (opt-in, interchange) · Apache DataFusion (SQL planner) · Tokio · pgwire-rs · openraft (single-process Raft WAL simulation today; cross-process distributed WAL is v0.2). Pure Rust, #![forbid(unsafe_code)] across every crate.

Basin's query engine is built on Apache DataFusion, the open-source SQL query engine from the Apache Software Foundation. Basin does not fork DataFusion — every query plan runs through upstream operators with Basin-shaped rules layered on top (RLS injection, project isolation, partition pruning).


What you can do today

  • Postgres-compatible SQL — pgwire v3, simple + extended query protocol, TLS (rustls), COPY FROM STDIN/COPY TO STDOUT (CSV). Works with psql, tokio-postgres, asyncpg, JDBC, Diesel, SeaORM, any Postgres ORM. A curated ORM/driver-compat suite plus a PG-oracle differential harness (differential_pg.rs) gate every release against a real PostgreSQL.
  • CRUD + DDLCREATE TABLE (incl. CREATE TABLE AS … WITH NO DATA), multi-row INSERT, SELECT, UPDATE, DELETE (Iceberg copy-on-write today; hot-tier tombstone fast path env-gated in 87ef24b), ON CONFLICT DO NOTHING / DO UPDATE upsert, ALTER TABLE … CLUSTER BY (…) / SET BLOOM FILTERS ON / SET row_group_rows / SET cold_after / ENABLE ROW LEVEL SECURITY / CREATE POLICY, SHOW TABLES. Prepared statements with parameter bind (text + binary, including native JSONB / UUID and correct NUMERIC / ARRAY binary wire formats).
  • Honest enforcement, not silent no-opsCREATE UNIQUE INDEX actually enforces uniqueness, VARCHAR(n)/CHAR(n) length is enforced, RLS WITH CHECK is enforced on write, TABLESAMPLE actually samples, advisory locks are real, and unsupported CREATE TRIGGER / MERGE honest-reject with a SQLSTATE instead of silently doing nothing. A wave of silent-corruption CRITICALs surfaced by the differential harness were fixed.
  • Expanded SQL surface — JSONPath (jsonb_path_query, @?, @@, jsonb_path_query_array); JSONB mutators (jsonb_set/insert/strip_nulls/pretty/typeof); json_build_object/json_build_array; INET/CIDR containment; regexp_match/matches/split_to_array/split_to_table, format, encode/decode; datetime age/to_char/to_date/date_bin; window IGNORE NULLS; SAVEPOINT / ROLLBACK TO; data-modifying CTEs; correlated + LATERAL joins (incl. CROSS JOIN LATERAL generate_series); bounded full-text search (tsvector/tsquery/@@); ordered-set aggregates (percentile_disc, mode() WITHIN GROUP); range/multirange arithmetic; real transaction semantics (deferred commits, ROLLBACK undo, SAVEPOINT stack, aborted state).
  • Time travel — Iceberg-style snapshots. Catalog::rollback_to_snapshot(project, table, snapshot_id) rewinds; Catalog::fork_table(project, src, dst) clones a table's metadata + snapshot history into a new sibling that diverges on next commit. Zero data copy until divergence.
  • Native vector searchvector(N) + <-> / <#> / <=> operators, HNSW per file segment. No pg_vector.
  • Postgres-extension equivalentspg_cron (basin-cron), pg_net + http (basin-net), pg_trgm (basin-trgm), PostGIS subset (basin-geo), TimescaleDB continuous aggregates (basin-cv), pgcrypto + uuid-ossp UDFs.
  • Auth + REST in the OSS bundle — basin-auth (signup, JWT, refresh-token rotation, email-link login, per-project API keys) + basin-rest (PostgREST-shape CRUD, cursor pagination + NDJSON streaming, OpenAPI 3.0 schema generation at GET /rest/v1/_openapi.json). auth.uid(), auth.role(), auth.jwt() SQL session functions let you write Supabase-style RLS policies.
  • Per-project connection URLsPOST /admin/v1/projects returns postgres://<user>:<password>@host:5433/<db>. Password bcrypt-validated on every pgwire startup; mismatch → SQLSTATE 28P01. Rotate via POST /admin/v1/projects/{user}/rotate.
  • Durable catalog — Iceberg-style catalog backed by Postgres when BASIN_CATALOG=postgres://...; tables, snapshots, project credentials, and basin-auth's identity tables survive process restart.
  • Cheap retention — Vortex (default, ~1.95× smaller than ZSTD Parquet on audit-log) or Parquet, ~29× smaller than Postgres heap on the 1M SaaS+OLAP shape, ~43× at 100k; per-file catalog column_stats + per-file bloom filters on basin.sort_by columns skip footer fetches and file opens when the predicate prunes the file.
  • Analytical path — a single DataFusion engine with Vortex/Parquet projection + predicate pushdown, catalog-statistics file pruning, per-file blooms, and incremental continuous materialized views. Approximate-cardinality and approximate-quantile UDFs (APPROX_COUNT_DISTINCT, APPROX_PERCENTILE) sit alongside exact counterparts for dashboard workloads. Heavy scans use stateless pooled compute over shared object storage.
  • Multi-schema isolation (phase A)SchemaName / QualifiedTableName types, a schema-aware in-memory and Postgres-backed catalog, a basin_schemas table, and CREATE/DROP SCHEMA + cross-schema queries with differential coverage. Phases B–E (full name resolution / search_path semantics / wider DDL) are still in progress — see Status.
  • Operations — connection pooling, per-project pgwire rate limiting (token-bucket via governor), cost-based query rejection (BASIN_QUERY_COST_LIMIT_ROWS), per-project counters (ops / bytes_read / bytes_written / errors / p99), OpenTelemetry traces wired through router → engine → shard → storage → WAL.

The full capability matrix (with what's planned and what's deferred): CAPABILITIES.md. The fine-grained per-syntax matrix derived from automated tests: docs/sql-support.md.


Status

Phase Description Status
0 Validate the wedge — customer interviews, design partners open (the gate; engineering is mature enough to need customer signal next)
1 Storage substrate — Vortex (default) / Parquet on object_store, Iceberg-style catalog shipped
2 WAL service — sub-5 ms write acks v0.1 shipped (single-node; Raft is v0.2)
3 Shard owners — per-project state, eviction, compactor v0.1 shipped (in-process; placement service is v0.2)
4 Routers + SQL — pgwire v3, extended query, TLS, COPY, native JSONB / UUID binding shipped — real single-shard transaction semantics (deferred commits, ROLLBACK undo, SAVEPOINT stack, aborted state) landed; cross-shard 2PC remains v0.2 (ADR 0011)
4.5 PostgreSQL SQL-compatibility push — silent-corruption CRITICAL fixes, JSONPath / JSONB-mutating / INET-CIDR / regexp / datetime function families, correct NUMERIC + ARRAY binary wire formats, PG-oracle differential harness (differential_pg.rs) shipped — Default config at ~91% / ~94% non-excluded (629/667); long-tail exotic-DDL parser gaps remain v0.2
5 Analytical path — single DataFusion engine, Vortex/Parquet pushdown + per-file bloom + catalog pruning, continuous pre-aggregation, APPROX_COUNT_DISTINCT/APPROX_PERCENTILE UDFs v0.1 shipped
5.0a Vortex storage format — ~1.95× smaller than ZSTD Parquet; aggregate_full ~15–40× via catalog-stats metadata path; per-file blooms flip point_eq from a loss to a win at every scale shipped as the DEFAULT (ADR 0015), zero-regression vs Parquet baseline. Parquet first-class per-table via WITH (basin.file_format='parquet'). HTAP hot-tier (ADR 0016) is Phase 5.14.C — closes the residual OLTP point-read and the UPDATE/DELETE write floor.
5.14 Durable Basin moat — per-file catalog blooms (shipped), APPROX_COUNT_DISTINCT + APPROX_PERCENTILE UDFs (shipped), catalog-aware WindowExec sort-elision (shipped), HTAP hot tier on by default (bed431c) — DELETE + UPDATE fast paths default-ON, kill-switch BASIN_HOTTIER_FASTPATH_DISABLE=1, merge-on-read via TombstoneFilterExec + UpdateOverlayExec wired in both DataFusion (HtapUnionTable::scan) and fast_select paths, gate-matrix locked by 16 tests, C6 differential harness extended with Mode D fastpath-on, TxCommit WAL marker (ADR 0020 §6) emitted explicitly with backward-compat replay. The 3-month investment that is not subsumed by upstream Vortex / DataFusion improvements. shipped
5.15 Unified docs platform — OSS-repo markdown with YAML frontmatter (spec), basin-cloud webapp consumes via npm run dev:docs build-time fetch OSS side shipped (5.15.A/B/C, frontmatter spec + 24-doc migration + top-level index + CI gate); basin-cloud webapp side (5.15.E–I) deferred to that repo
5.5 Sharding axes — partitioning, compute sharding, tiered storage shipped
5.6 RLS with CREATE POLICY (UNION / CTE coverage) shipped
5.7 Caches + bloom + A4 catalog stats + B2 cluster-by + B3 row-group sizing shipped; B1 secondary indexes is the biggest open perf win (~8 weeks)
5.8 pg_cron + pg_net SQL surfaces shipped
5.9 Postgres-extension equivalents (basin-geo / -trgm / -cv, JSONB, UUID, pgcrypto) shipped
5.10 Identity + REST (basin-auth, basin-rest, OpenAPI, pagination, streaming, API keys, refresh rotation, per-project connection URLs, auth.uid() / auth.role() / auth.jwt() session functions) shipped
5.11 Multi-schema isolation phase A shippedSchemaName/QualifiedTableName types, schema-aware in-memory + Postgres catalog, basin_schemas table, CREATE/DROP SCHEMA + cross-schema queries with differential coverage. Phases B–E in progress — full qualified-name resolution, search_path semantics, wider schema-scoped DDL
6 Production hardening partial — telemetry / pooling / rate-limit / cost-rejection / catalog-PITR / fork shipped; multi-region (ADR 0009), catalog replication (ADR 0010), cross-shard 2PC (ADR 0011) all locked architecturally and gated on customer demand
6.x SQL long-tail (still pending) plannedCOPY FROM STDIN ergonomics, server-side PREPARE/EXECUTE over text protocol edge cases, LISTEN/NOTIFY, plpgsql DO blocks, full MERGE, exotic types (multirange / BIT / OID / REGCLASS), and the parser-refused exotic DDL forms
7 Launch gated on Phase 0

Six-month wedge slice: WEDGE.md. Full plan: TASK.md. Decision log: docs/decisions/.


How Basin compares

vs Postgres / Aurora / RDS

Postgres is the right answer for single-project, high-frequency OLTP and for workloads that need sub-millisecond point-mutation latency at 1M+ rows. Basin's hot-tier UPDATE/DELETE fast paths are on by default as of Phase 5.14 closure — point UPDATE at 1M lands at ~9ms (vs PG ~1.3ms), within 7× rather than the pre-fix 1550×, but PG's index+heap is still structurally faster on that one shape. Basin is not trying to be Postgres on those shapes. Where Basin wins: many-isolated-projects (per-environment / per-customer / per-region), append-shaped workloads where the columnar bytes-at-rest savings compound, the analytical shapes that grow Basin's lead with scale (122× faster correlated subquery at 1M, 61× LATERAL JOIN, 25× JOIN GROUP BY), and the structural RAM-per-connection economics for connection-heavy front ends.

vs Neon

Neon is serverless Postgres with branching — terrific for single-DB workloads that want copy-on-write forks. Basin matches the branching story (Iceberg forks are zero-copy too) but stores data on plain S3 rather than a managed page server. Neon's per-project minimum scales with project count (O(provisioned pool)); Basin's per-project cost is O(bytes), so many isolated projects stay cheap.

vs Supabase

Supabase is "BaaS in a box" — Postgres + Auth + Edge Functions + Storage + Realtime. Basin covers the SQL + Auth + REST surface in one binary, with auth.uid() / auth.role() / auth.jwt() working identically. Where Basin differs is the data-layer economics: Vortex/Parquet on S3 instead of Postgres heap on block storage. Multi-project SaaS that has outgrown Supabase's per-project pricing can migrate the database to Basin via pgwire and keep Supabase Auth / Edge Functions / Realtime for the parts of the stack they handle well. Edge Functions / Realtime / Storage are out of scope per ADRs 0005/0006.

vs Nile

Nile is "Postgres for multi-tenant SaaS" — same problem space as Basin, but built directly on real PostgreSQL with per-tenant virtual databases. That choice gives Nile real PG semantics, real OLTP, real JSONB, real extensions, and real PL/pgSQL for free, which is exactly where Basin still trails today. Basin's structural answer is the substrate economics: Nile's per-tenant cost is bounded by the underlying PG instance's per-tenant cost (heap pages, connection slots, autovacuum overhead); Basin's per-tenant cost is O(bytes-on-S3) with shared compute, so cold or low-traffic tenants stay near-zero. The right card to read is cost per tenant per month at p99 latency, at scale — single-instance PG-based multi-tenancy gets expensive at 10k+ tenants in a way object-storage-native multi-tenancy doesn't. If your workload is point-mutation-heavy and JSONB-heavy and you have <1k tenants, Nile is probably the easier answer today. If you have many idle or low-traffic tenants, append-shaped data, or want columnar bytes-at-rest economics, Basin is the cheaper substrate.

vs Turso / libSQL

Turso is the right answer for edge-distributed apps with many tiny SQLite-class databases. Basin is for centralized apps that want Postgres SQL on cheap object storage with a real wire-protocol surface that ORMs already speak.

vs ClickHouse / DuckDB / data-warehouse

ClickHouse and DuckDB are analytical engines — phenomenal at OLAP scans, not designed for transactional point reads or per-row inserts. Basin will lose to a dedicated warehouse on heavy GROUP BY / window / recursive-CTE shapes (see the perf table above). Basin's pitch is the unified path: one engine, one binary, pgwire on the wire, columnar substrate underneath — useful when "Snowflake plus a Postgres" is overkill for the workload size. The HTAP hot-tier (ADR 0016) closes the OLTP point-read and point-write floor on TB-scale tables — the missing piece that lets one engine span OLAP and OLTP without a second system.

Where Basin is not the answer

Per the ADRs:

  • Single-project high-frequency OLTP → Postgres / Aurora / Neon / Nile
  • Edge / local-first → Turso / libSQL / Cloudflare D1
  • Geospatial primary store → PostGIS
  • Embedding-only workload → dedicated vector DB (Qdrant, Pinecone) — but Basin's native vector(N) works fine alongside tabular data
  • Embedded SQLite-class library → SQLite
  • Globally strongly-consistent writes across regions → Spanner / CockroachDB
  • Heavy OLAP workhorse → Snowflake / ClickHouse / DuckDB

References


Ecosystem

Basin (this repo) is the data plane. Three sibling repos sit around it:

  • bas-in/basin-cloud — control plane and dashboard (Go + Vite/JSX SPA, Apache-2.0). Manages orgs, projects, billing; runs Basin engines on Fly Machines per project. Operators who want a managed UI use it. Operators running a single self-hosted engine do not — basin-server alone is sufficient.
  • bas-in/basin-cli — operator daily-driver (Go, Apache-2.0, stdlib-only). basin login, basin projects list, basin sql run, release artefacts are Sigstore-signed. Talks to basin-cloud's /v1/* API.
  • bas-in/basin-js — TypeScript SDK (MIT). Supabase-shaped createClient(url, anonKey) that talks directly to a Basin engine (pgwire + REST), not through basin-cloud. Browser, Node, Deno, Bun, Cloudflare Workers. jsr:@bas-in/basin-js and npm:@bas-in/basin-js.
  • Planned client SDKs — basin-py, basin-rs, basin-go, basin-dart, basin-swift, basin-kotlin. All will follow the same engine-direct shape as basin-js.

Licensing rationale. Server-side projects (basin engine, basin-cloud, basin-cli) are Apache-2.0 to carry the patent grant operators expect from infrastructure. Client SDKs (basin-js and future siblings) are MIT to match the norm of the SDK ecosystems they sit in.


Project layout

crates/
  basin-common      shared types, errors, telemetry
  basin-storage     Vortex (default) / Parquet + object_store under project prefixes
  basin-catalog     Iceberg-style catalog (in-memory + Postgres-backed durable)
  basin-wal         file-backed WAL (Raft-backed in v0.2)
  basin-shard       in-process shard owner with WAL → Vortex/Parquet compactor
  basin-engine      single DataFusion engine — point reads + analytical pool, per-project sessions
  basin-router      pgwire v3 (simple + extended query)
  basin-vector      native HNSW vector search
  basin-hottier     in-memory hot tier (read-merge + tombstone fast path; on-by-default rollout in 5.14.C5/C6)
  basin-placement   (Phase 3 v0.2) (project, partition) → owner mapping
services/
  basin-server      single-process binary
benchmark/          dashboard + auto-regenerated RESULTS_*.md
docs/
  architecture.md   the four-layer stack, in detail
  multi-project.md  the multi-project SaaS story (per-project isolation, scheduler, cost math)
  decisions/        ADRs — every "no" with the trigger that would change our mind
  sql-compatibility.md  hand-written compatibility narrative (planner / catalog scope)
  sql-support.md    auto-generated per-syntax matrix (sql_support_matrix.rs)
tests/integration/  cross-crate viability + scaling + Postgres comparisons

Build and test

# Workspace build:
cargo build --workspace
cargo test  --workspace

# Run the benchmark suite + regenerate dashboard / RESULTS_localfs.md:
cargo test -p basin-integration-tests --tests -- --nocapture
python3 benchmark/bundle.py

# Then open the dashboard (no server required):
open benchmark/index_localfs.html

License

Apache-2.0 — see LICENSE.

Contributions welcome. The project is opinionated about scope (docs/decisions/) — open an issue before writing a PR that adds new surface area. The OSS code is the database; commercial cloud orchestration lives in a separate private repo and never affects what OSS users get.


Keywords for search

Basin is a Postgres-compatible, bucket-native, multi-tenant database on object storage, with Vortex (default columnar, LFAI) and Apache Parquet (opt-in, interchange) storage, an Apache Iceberg catalog, a file-backed WAL with a Raft WAL simulation toward distributed v0.2, native vector search (HNSW), per-file catalog bloom filters for point-query pruning, HTAP hot tier on the roadmap (ADR 0016), and pgwire protocol support that works with psql, tokio-postgres, asyncpg, JDBC, Diesel, SeaORM, and any other Postgres driver. Basin compares to Postgres, Neon, Supabase, Nile, Turso, PlanetScale, Aurora, ClickHouse, SingleStore, DuckDB, and CockroachDB for cheap-storage SaaS, audit-log, RAG / vector, HTAP, and multi-project use cases. Self-hostable, Apache-2.0 licensed, written in Rust.