|
| 1 | +# The Stack — Putting It All Together |
| 2 | + |
| 3 | +Twenty chapters later, here's what we know: Postgres is a relational database that can also store documents, search text, queue jobs, serve as a key-value store, index vectors, handle time-series data, and do most of what five specialized databases would otherwise do. It's ACID-compliant, well-understood, battle-tested, and backed by thirty years of engineering. And it doesn't require you to run six separate systems to do it. |
| 4 | + |
| 5 | +This final chapter ties everything together into a reference architecture — the stack that a well-run, Postgres-centric team actually runs. |
| 6 | + |
| 7 | +## The Core Principle |
| 8 | + |
| 9 | +Design your stack around your actual requirements, not theoretical scale. Start simple. Add components only when you have evidence that the simpler approach is insufficient. Every component you add is a component you operate, monitor, debug, and eventually upgrade forever. |
| 10 | + |
| 11 | +The reference architecture in this chapter is for a typical production web application or API — something serving thousands to millions of users. The vast majority of products fall into this category. Scale up from here when evidence demands it. |
| 12 | + |
| 13 | +## The Reference Architecture |
| 14 | + |
| 15 | +``` |
| 16 | +┌─────────────────────────────────────────────────────────────┐ |
| 17 | +│ Application Layer │ |
| 18 | +│ (your web app / API servers) │ |
| 19 | +└──────────────────────────┬──────────────────────────────────┘ |
| 20 | + │ connection pool |
| 21 | + ▼ |
| 22 | +┌─────────────────────────────────────────────────────────────┐ |
| 23 | +│ PgBouncer │ |
| 24 | +│ (transaction mode pooler) │ |
| 25 | +└──────────────────────────┬──────────────────────────────────┘ |
| 26 | + │ 20–50 Postgres connections |
| 27 | + ┌──────────────┴──────────────┐ |
| 28 | + ▼ ▼ |
| 29 | +┌───────────────────┐ ┌───────────────────┐ |
| 30 | +│ Postgres Primary │◄────────│ Postgres Standby │ |
| 31 | +│ (read/write) │ stream │ (read-only) │ |
| 32 | +└────────────┬──────┘ repl. └───────────────────┘ |
| 33 | + │ |
| 34 | + │ WAL archiving |
| 35 | + ▼ |
| 36 | +┌───────────────────┐ |
| 37 | +│ S3 / GCS │ |
| 38 | +│ (WAL archive + │ |
| 39 | +│ base backups) │ |
| 40 | +└───────────────────┘ |
| 41 | +``` |
| 42 | + |
| 43 | +That's the core. One primary, one standby, PgBouncer in front, WAL-archived to object storage. |
| 44 | + |
| 45 | +## What Postgres Handles |
| 46 | + |
| 47 | +In this architecture, Postgres handles everything by default: |
| 48 | + |
| 49 | +**Core relational data:** Users, orders, products, accounts — your domain model, normalized, with proper foreign keys and constraints. |
| 50 | + |
| 51 | +**Document storage:** Product attributes, configuration, event payloads — JSONB columns alongside relational columns. GIN indexes for queries. |
| 52 | + |
| 53 | +**Full-text search:** Article content, product descriptions, support tickets — tsvector columns with GIN indexes. `websearch_to_tsquery` for user-facing search. |
| 54 | + |
| 55 | +**Background jobs:** Order processing, email sending, report generation — a `jobs` table with `SKIP LOCKED`, or River/pg-boss. Transactional enqueue: jobs created atomically with the business data they operate on. |
| 56 | + |
| 57 | +**Sessions and ephemeral data:** User sessions, rate limiting counters, temporary tokens — unlogged tables for performance with acceptable durability trade-offs. |
| 58 | + |
| 59 | +**Caching:** Computed summaries, denormalized read models — materialized views or application-managed cache tables. Refreshed on schedule via pg_cron. |
| 60 | + |
| 61 | +**Feature flags:** Configuration, A/B test assignments — a `feature_flags` table, read at application startup or cached with a short TTL. |
| 62 | + |
| 63 | +**Vector search:** Embeddings for semantic search, RAG pipelines, recommendation systems — pgvector with HNSW indexes. Hybrid search combining vectors with full-text and relational filters. |
| 64 | + |
| 65 | +**Time-series data:** Application metrics, user events, audit logs — partitioned tables or TimescaleDB, with BRIN indexes and continuous aggregates. |
| 66 | + |
| 67 | +**Audit trails:** Every data change recorded with before/after values — trigger-based audit tables, partitioned by month. |
| 68 | + |
| 69 | +**Scheduled tasks:** Partition maintenance, expired session cleanup, materialized view refresh — pg_cron, all visible and manageable from SQL. |
| 70 | + |
| 71 | +## The Configuration Baseline |
| 72 | + |
| 73 | +A Postgres configuration for a 32GB production server: |
| 74 | + |
| 75 | +```ini |
| 76 | +# Memory |
| 77 | +shared_buffers = 8GB |
| 78 | +effective_cache_size = 24GB |
| 79 | +work_mem = 64MB |
| 80 | +maintenance_work_mem = 2GB |
| 81 | + |
| 82 | +# Checkpoints |
| 83 | +checkpoint_completion_target = 0.9 |
| 84 | +max_wal_size = 4GB |
| 85 | +checkpoint_timeout = 15min |
| 86 | + |
| 87 | +# WAL |
| 88 | +wal_level = replica |
| 89 | +wal_compression = on |
| 90 | +archive_mode = on |
| 91 | +archive_command = 'pgbackrest --stanza=main archive-push %p' |
| 92 | + |
| 93 | +# Connections |
| 94 | +max_connections = 200 # PgBouncer handles fan-out |
| 95 | + |
| 96 | +# Autovacuum |
| 97 | +autovacuum_max_workers = 6 |
| 98 | +autovacuum_vacuum_cost_delay = 2ms |
| 99 | +autovacuum_vacuum_cost_limit = 2000 |
| 100 | + |
| 101 | +# Logging |
| 102 | +log_min_duration_statement = 1000 |
| 103 | +log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ' |
| 104 | + |
| 105 | +# Stats |
| 106 | +shared_preload_libraries = 'pg_stat_statements, pg_cron' |
| 107 | +pg_stat_statements.track = all |
| 108 | +``` |
| 109 | + |
| 110 | +Tune from this baseline based on your workload profile and actual performance data. |
| 111 | + |
| 112 | +## The Extension Set |
| 113 | + |
| 114 | +The extensions to install on day one: |
| 115 | + |
| 116 | +```sql |
| 117 | +-- Observability (MUST HAVE) |
| 118 | +CREATE EXTENSION IF NOT EXISTS pg_stat_statements; |
| 119 | + |
| 120 | +-- Fuzzy search and LIKE acceleration |
| 121 | +CREATE EXTENSION IF NOT EXISTS pg_trgm; |
| 122 | + |
| 123 | +-- Partition management |
| 124 | +CREATE EXTENSION IF NOT EXISTS pg_partman; |
| 125 | + |
| 126 | +-- Scheduled tasks |
| 127 | +CREATE EXTENSION IF NOT EXISTS pg_cron; |
| 128 | + |
| 129 | +-- Cryptography |
| 130 | +CREATE EXTENSION IF NOT EXISTS pgcrypto; |
| 131 | + |
| 132 | +-- Vector search (when you need it) |
| 133 | +CREATE EXTENSION IF NOT EXISTS vector; |
| 134 | + |
| 135 | +-- Geospatial (when you need it) |
| 136 | +CREATE EXTENSION IF NOT EXISTS postgis; |
| 137 | + |
| 138 | +-- TimescaleDB (when time-series volume warrants it) |
| 139 | +CREATE EXTENSION IF NOT EXISTS timescaledb; |
| 140 | +``` |
| 141 | + |
| 142 | +These extensions are mature, widely-deployed, and add significant capability with minimal operational complexity. |
| 143 | + |
| 144 | +## The Schema Foundation |
| 145 | + |
| 146 | +Every application table should have: |
| 147 | + |
| 148 | +```sql |
| 149 | +-- The template: |
| 150 | +CREATE TABLE <table_name> ( |
| 151 | + id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, |
| 152 | + -- ... business columns ... |
| 153 | + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), |
| 154 | + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() |
| 155 | +); |
| 156 | + |
| 157 | +-- Auto-update updated_at: |
| 158 | +CREATE TRIGGER trigger_<table_name>_updated_at |
| 159 | + BEFORE UPDATE ON <table_name> |
| 160 | + FOR EACH ROW EXECUTE FUNCTION set_updated_at(); |
| 161 | + |
| 162 | +-- The trigger function (once per schema): |
| 163 | +CREATE OR REPLACE FUNCTION set_updated_at() |
| 164 | +RETURNS TRIGGER AS $$ |
| 165 | +BEGIN |
| 166 | + NEW.updated_at = now(); |
| 167 | + RETURN NEW; |
| 168 | +END; |
| 169 | +$$ LANGUAGE plpgsql; |
| 170 | +``` |
| 171 | + |
| 172 | +For multi-tenant applications, every table additionally has: |
| 173 | +```sql |
| 174 | +tenant_id BIGINT NOT NULL REFERENCES tenants(id), |
| 175 | +``` |
| 176 | +with RLS policies enforcing tenant isolation and indexes on `tenant_id` to keep those policies fast. |
| 177 | + |
| 178 | +## The Migration Workflow |
| 179 | + |
| 180 | +Every schema change goes through: |
| 181 | + |
| 182 | +1. **Write SQL migration** with `CONCURRENTLY` for indexes, `NOT VALID` + `VALIDATE` for constraints |
| 183 | +2. **Test against production-sized data** on a clone |
| 184 | +3. **Deploy in phases** if old and new code can't coexist (expand, then contract) |
| 185 | +4. **`SET lock_timeout = '5s'`** before any DDL that takes `ACCESS EXCLUSIVE` |
| 186 | +5. **Monitor** during migration: `pg_stat_activity`, `pg_locks`, slow query log |
| 187 | + |
| 188 | +The expand/contract pattern for column renames and type changes. Never drop a column in the same deployment cycle as the one that stops using it. |
| 189 | + |
| 190 | +## The Backup Setup |
| 191 | + |
| 192 | +``` |
| 193 | +Daily: Full backup via pgBackRest → S3 |
| 194 | +Hourly: Differential backup via pgBackRest → S3 |
| 195 | +Continuous: WAL archiving via pgBackRest → S3 |
| 196 | +
|
| 197 | +Retention: |
| 198 | + Full backups: 4 weeks |
| 199 | + Differential: 7 days |
| 200 | + WAL: 7 days (enables PITR within that window) |
| 201 | +
|
| 202 | +Testing: |
| 203 | + Monthly: Full restore test to a staging environment |
| 204 | + After config changes: Verify archiving still works |
| 205 | +``` |
| 206 | + |
| 207 | +The goal: restore any database state from the past 7 days in under 2 hours (RTO), with data loss no greater than the last WAL segment (RPO measured in seconds). |
| 208 | + |
| 209 | +## The Observability Stack |
| 210 | + |
| 211 | +``` |
| 212 | +pg_stat_statements → query performance dashboard |
| 213 | +postgres_exporter → Prometheus → Grafana |
| 214 | +pgBadger → daily slow query report (from logs) |
| 215 | +pg_stat_activity → real-time connection monitoring |
| 216 | +``` |
| 217 | + |
| 218 | +Alerts on: |
| 219 | +- Cache hit ratio < 95% |
| 220 | +- Replication lag > 60 seconds |
| 221 | +- Dead tuple percentage > 20% |
| 222 | +- `idle in transaction` connections > 5 for > 30 seconds |
| 223 | +- Connections > 150 (80% of max_connections=200) |
| 224 | +- Checkpoint_req > 20% of total checkpoints (WAL pressure) |
| 225 | + |
| 226 | +## The Security Baseline |
| 227 | + |
| 228 | +```sql |
| 229 | +-- Application credentials (not superuser): |
| 230 | +CREATE ROLE myapp WITH LOGIN PASSWORD '...'; |
| 231 | +GRANT readwrite TO myapp; |
| 232 | + |
| 233 | +-- Separate migration credentials: |
| 234 | +CREATE ROLE migrations WITH LOGIN PASSWORD '...'; |
| 235 | +GRANT readwrite TO migrations; |
| 236 | +GRANT CREATE ON SCHEMA public TO migrations; |
| 237 | + |
| 238 | +-- Read replica credentials: |
| 239 | +CREATE ROLE myapp_reader WITH LOGIN PASSWORD '...'; |
| 240 | +GRANT readonly TO myapp_reader; |
| 241 | + |
| 242 | +-- RLS on multi-tenant tables: |
| 243 | +ALTER TABLE orders ENABLE ROW LEVEL SECURITY; |
| 244 | +CREATE POLICY tenant_isolation ON orders |
| 245 | + USING (tenant_id = current_setting('app.tenant_id')::bigint); |
| 246 | + |
| 247 | +-- Force SSL: |
| 248 | +-- pg_hba.conf: hostssl all all 0.0.0.0/0 scram-sha-256 |
| 249 | +-- hostnossl all all 0.0.0.0/0 reject |
| 250 | +``` |
| 251 | + |
| 252 | +## The High-Availability Topology |
| 253 | + |
| 254 | +For production workloads: |
| 255 | + |
| 256 | +``` |
| 257 | +Primary (read/write) |
| 258 | + ↓ streaming replication (async) |
| 259 | +Standby (read-only, hot standby) |
| 260 | + ↓ HAProxy or DNS |
| 261 | +Application (routes writes to primary, reads to standby or primary) |
| 262 | +``` |
| 263 | + |
| 264 | +For critical workloads where data loss is unacceptable: |
| 265 | +- Synchronous replication to one standby |
| 266 | +- The other standbys async (for read scaling without commit latency impact) |
| 267 | + |
| 268 | +For managed environments: use the cloud provider's multi-AZ HA. AWS RDS Multi-AZ, Cloud SQL HA, Aurora — these handle failover automatically. |
| 269 | + |
| 270 | +## When to Add Something |
| 271 | + |
| 272 | +The reference architecture above has no Redis, no Elasticsearch, no Kafka. When does that change? |
| 273 | + |
| 274 | +**Add a caching layer (Redis or Memcached) when:** |
| 275 | +- You've profiled and found that Postgres is the actual bottleneck for read-heavy workloads |
| 276 | +- Your application has data that is fetched very frequently but changes rarely, and the cost of cache invalidation logic is less than the cost of the database load |
| 277 | +- You need sub-millisecond latency at a scale that Postgres unlogged tables can't provide |
| 278 | + |
| 279 | +**Add Elasticsearch when:** |
| 280 | +- You have hundreds of millions of documents and search relevance is a core product feature that requires sophisticated tuning |
| 281 | +- Your analytics team needs faceted navigation with complex aggregations at a scale that Postgres can't support |
| 282 | + |
| 283 | +**Add Kafka when:** |
| 284 | +- You need durable, ordered event streaming between multiple independent systems |
| 285 | +- Your outbox table pattern is generating replication lag that's affecting your primary's performance |
| 286 | +- You're integrating with external systems that speak Kafka |
| 287 | + |
| 288 | +**Add a data warehouse (ClickHouse, BigQuery, Snowflake) when:** |
| 289 | +- Your analytics queries are scanning billions of rows and impacting primary/replica performance |
| 290 | +- Your data team needs complex ad-hoc analytical queries over your full dataset |
| 291 | + |
| 292 | +Notice the pattern: these additions are made in response to evidence, not in anticipation of theoretical scale. The starting point is Postgres. The additions happen when you've hit the actual limit. |
| 293 | + |
| 294 | +## A Final Word |
| 295 | + |
| 296 | +The engineers who run sophisticated systems on a single Postgres instance are not lazy. They're not unaware of the alternatives. They've made a deliberate choice: to invest deeply in one tool and use it well, rather than spread thin knowledge across many. |
| 297 | + |
| 298 | +Postgres rewards that investment. The more you know about it — MVCC, the query planner, VACUUM, WAL, extensions — the more valuable it becomes. The knowledge compounds. The expertise transfers to every team and system you work with. The operational habits you build (proper indexing, safe migrations, autovacuum monitoring, backup testing) make you a better engineer on any system. |
| 299 | + |
| 300 | +The thesis of this book is simple: most teams reach for specialized databases before they need to, and pay a complexity tax they don't have to pay. Postgres, properly understood and configured, handles a remarkable range of workloads — relational, document, search, queuing, key-value, vector, time-series — with a level of reliability and operational simplicity that no polyglot stack can match. |
| 301 | + |
| 302 | +Use it well. Use all of it. And when you eventually do hit its limits — and you might, if you build something large enough — you'll know exactly where the limits are, and you'll have the expertise to make the right call about what to add next. |
| 303 | + |
| 304 | +Postgres is enough. Now go build something. |
| 305 | + |
| 306 | +--- |
| 307 | + |
| 308 | +## Quick Reference: The Postgres-First Decision Tree |
| 309 | + |
| 310 | +``` |
| 311 | +Is this about storing and querying data? |
| 312 | +├── Yes |
| 313 | +│ ├── Does it fit on one machine at current scale? → Postgres |
| 314 | +│ ├── Is it time-series with high ingest? → TimescaleDB (it's still Postgres) |
| 315 | +│ ├── Is it geospatial? → PostGIS (it's still Postgres) |
| 316 | +│ ├── Is it vectors? → pgvector (it's still Postgres) |
| 317 | +│ └── Is it truly global with write latency requirements? → Evaluate Spanner/CockroachDB |
| 318 | +│ |
| 319 | +├── Do you need a job queue? |
| 320 | +│ └── Is it < 100k jobs/sec? → River or pg-boss (Postgres) |
| 321 | +│ |
| 322 | +├── Do you need full-text search? |
| 323 | +│ └── Is it < 100M docs, not a search product? → Postgres FTS |
| 324 | +│ |
| 325 | +├── Do you need caching? |
| 326 | +│ └── Have you measured and proven Postgres is too slow? → Maybe Redis |
| 327 | +│ |
| 328 | +└── Do you need pub/sub? |
| 329 | + ├── For worker notification? → LISTEN/NOTIFY (Postgres) |
| 330 | + └── For distributed event streaming? → Kafka |
| 331 | +``` |
| 332 | + |
| 333 | +The answer to "do you need X?" is almost always "try Postgres first." |
0 commit comments