zeroae · sodre · Feb 15, 2026 · Jun 12, 2026 · Jun 12, 2026
diff --git a/docs/adr/121-policy-rename-clarity.md → docs/adr/124-policy-rename-clarity.md b/docs/adr/121-policy-rename-clarity.md → docs/adr/124-policy-rename-clarity.md
@@ -1,4 +1,4 @@
-# ADR-121: Rename IAM Policies for Clarity
+# ADR-124: Rename IAM Policies for Clarity
 
 **Status:** Accepted
 **Date:** 2026-02-02

diff --git a/docs/adr/126-multi-region-independent-tables.md b/docs/adr/126-multi-region-independent-tables.md
@@ -0,0 +1,58 @@
+# ADR-126: Multi-Region via Independent Regional Tables
+
+**Status:** Proposed
+**Date:** 2026-02-14
+
+## Context
+
+Users need to enforce rate limits across multiple AWS regions (e.g., an organization's
+global RPM limit must be shared by clients in us-east-1 and eu-west-1). DynamoDB Global
+Tables is the obvious candidate, but it has fundamental conflicts with zae-limiter's
+write patterns:
+
+- **ADD counter loss:** Global Tables uses last-writer-wins at the item level. Concurrent
+  `ADD tk -consumed` from two regions results in one write overwriting the other, silently
+  losing consumption data and causing over-admission.
+- **Transaction non-atomicity:** `TransactWriteItems` is ACID only in the originating
+  region. Cascade child+parent writes appear as partial updates in other regions.
+- **Double refill:** Each region's aggregator processes its own stream. Replicated writes
+  appear in all streams, requiring filtering to avoid double-counting and double-refilling.
+
+The namespace feature (issue #376) already provides write isolation: each namespace has
+its own partition key prefix, so records in different namespaces never collide.
+
+## Decision
+
+Multi-region must use **independent DynamoDB tables per region**, one per deployed stack.
+Each region must use a dedicated namespace for its rate-limiting data. Cross-region
+coordination must be handled by a periodic sync mechanism (see ADR-127, ADR-130), not by
+DynamoDB replication. Global Tables must not be used for the rate-limiting table.
+
+## Consequences
+
+**Positive:**
+- Write cost stays at 1x (no replicated WCU tax)
+- All existing write patterns (speculative, optimistic lock, transactions) work unchanged
+- Aggregator Lambda processes only local events, no stream filtering needed
+- Each region is fully independent; one region's failure does not affect others
+
+**Negative:**
+- No automatic data replication; regional data is lost if a region fails permanently
+- Cross-region coordination requires a new sync component (ADR-127, ADR-130)
+- Rate limiter state is ephemeral; region loss causes temporary over-admission until
+  sync catches up, bounded by one sync window
+
+## Alternatives Considered
+
+### DynamoDB Global Tables with namespace-per-region isolation
+Rejected because: replicated WCUs double write cost, ADD operations lose data under
+concurrent cross-region writes, and transactions are not atomic across regions.
+
+### DynamoDB Global Tables with counter sharding (per-region SET attributes)
+Rejected because: requires reworking the composite bucket schema (ADR-114), breaks
+speculative writes, and the 2x write cost is not justified when a sync mechanism is
+needed regardless.
+
+### Centralized single-region table with cross-region API calls
+Rejected because: adds 50-150ms latency to every acquire() call for remote-region
+clients, creating a single point of failure with no local fallback.
diff --git a/docs/adr/127-s3-sync-exchange.md b/docs/adr/127-s3-sync-exchange.md
@@ -0,0 +1,59 @@
+# ADR-127: S3-Based Cross-Region Sync Exchange
+
+**Status:** Proposed
+**Date:** 2026-02-14
+
+## Context
+
+With independent DynamoDB tables per region (ADR-126), a sync mechanism must exchange
+consumption data between regions. The exchange payload is a snapshot of all active
+entities' bucket states: `total_consumed_milli`, `tokens_milli`, and `capacity_milli`
+per entity, resource, and limit.
+
+Using DynamoDB as the exchange medium means writing one item per active (entity, resource)
+pair per sync cycle. At 2,000 active entities with 10 resources and a 10-second sync
+window, this costs ~$3,900/month in WCU alone — roughly 3x the entire acquire budget.
+
+The sync payload is a batch snapshot: all active entities' state at a point in time.
+This is a bulk data transfer problem, not an item-level access problem.
+
+## Decision
+
+Each region's sync Lambda must write its consumption snapshot as a single S3 object
+(JSON) to a shared sync bucket, keyed by `{region}/snapshot.json`. Remote regions must
+read these objects via cross-region S3 GET. DynamoDB must not be used for publishing
+sync reports.
+
+The snapshot must include, per active (entity, resource) pair: the per-limit
+`total_consumed_milli` counter, the current `tokens_milli`, and the configured
+`capacity_milli`. Snapshot objects must have a TTL (S3 lifecycle) of 5 minutes.
+
+## Consequences
+
+**Positive:**
+- Publishing cost drops to ~$1.30/month regardless of entity count (1 S3 PUT per cycle)
+- Reading cost is ~$0.10/month per remote region (1 S3 GET per cycle)
+- Snapshot size is bounded: 2,000 entities x 10 resources x 60 bytes = ~1.2 MB per PUT
+- S3 is highly available and durable; no capacity planning needed
+
+**Negative:**
+- Introduces S3 dependency for cross-region coordination (new failure mode)
+- S3 eventual consistency means a GET may return a slightly stale snapshot (~1s)
+- Requires a shared S3 bucket accessible from all regions (cross-region GET latency
+  ~100ms, acceptable for background sync)
+- Snapshot format must be versioned to handle schema evolution
+
+## Alternatives Considered
+
+### DynamoDB items for sync reports (one per entity per resource)
+Rejected because: WCU cost scales linearly with entity count, reaching $3,900/month
+at 2,000 active entities with 10-second sync — 3x the acquire budget.
+
+### DynamoDB items for sync reports (one batch item per resource)
+Rejected because: 400KB item size limit caps at ~4,000 entities per item, and large
+item writes consume proportionally more WCUs, offering no cost advantage over
+individual items.
+
+### SQS/SNS for event-driven sync
+Rejected because: requires per-event cross-region message delivery, adding complexity
+and cost proportional to acquire volume rather than sync frequency.
diff --git a/docs/adr/128-quota-enforcement-via-config.md b/docs/adr/128-quota-enforcement-via-config.md
@@ -0,0 +1,63 @@
+# ADR-128: Quota Enforcement via Entity Config Overrides
+
+**Status:** Proposed
+**Date:** 2026-02-14
+
+## Context
+
+With independent tables per region (ADR-126) and S3-based sync (ADR-127), each region's
+sync Lambda computes a regional quota for each entity. This quota must be enforced by
+the rate limiter's hot path without modifying the acquire flow.
+
+Three enforcement mechanisms were evaluated:
+
+1. **Entity config overrides:** Write adjusted limits via `set_limits()`, picked up by
+   the existing config cache on the next resolve.
+2. **Shadow counter on bucket item:** Write a `remote_tc` attribute on the bucket and
+   add a condition to speculative writes.
+3. **Direct token deduction:** `ADD b_rpm_tk -remote_delta` on the bucket item.
+
+The shadow counter approach has a semantic mismatch: `total_consumed_milli` is a lifetime
+monotonic counter incompatible with the per-window token bucket model. Direct token
+deduction does not adjust the refill ceiling — each region's bucket refills at the full
+global rate, causing N regions to provide Nx the intended refill.
+
+## Decision
+
+Regional quotas must be enforced by writing entity-level config overrides via the
+existing `set_limits(entity_id, resource, limits)` API. The sync Lambda must compute
+the allocated capacity per entity and write it as an entity config record. The rate
+limiter's existing config resolution hierarchy (Entity > Resource > System) must be
+the sole mechanism for quota enforcement.
+
+## Consequences
+
+**Positive:**
+- Zero changes to the acquire hot path (speculative writes, optimistic lock, bucket math)
+- Uses the existing config hierarchy; no new DynamoDB schema or access patterns
+- Capacity adjustment naturally controls refill ceiling via token bucket math
+- Config cache TTL provides built-in staleness tolerance (already accepted in ADR-105)
+
+**Negative:**
+- Token drain lag: if current tokens exceed the new reduced capacity, the entity can
+  consume excess tokens until they drain naturally (bounded by consumption rate)
+- Config writes are the dominant sync cost (~$40/month at 2,000 active entities with
+  trigger-based filtering per ADR-129)
+- Config cache TTL (default 60s) delays quota enforcement after a config write; the
+  sync Lambda and application use separate Repository instances
+
+## Alternatives Considered
+
+### Shadow counter attribute on bucket item (remote_tc)
+Rejected because: `total_consumed_milli` is a monotonic lifetime counter that cannot
+be compared against a per-window capacity limit, and modifying the speculative write
+condition changes the hot path for all users.
+
+### Direct token deduction (ADD tk -remote_delta)
+Rejected because: each region's bucket still refills at the full global rate, so N
+regions produce Nx total refill — the deduction fights the bucket math without
+correcting the underlying refill ceiling.
+
+### In-memory client-side consumption map (no DynamoDB writes)
+Rejected because: requires background polling threads and in-memory state, which works
+for long-running services but not for Lambda-based rate limiting.
diff --git a/docs/adr/129-trigger-based-sync-writes.md b/docs/adr/129-trigger-based-sync-writes.md
@@ -0,0 +1,61 @@
+# ADR-129: Trigger-Based Sync Config Writes
+
+**Status:** Proposed
+**Date:** 2026-02-14
+
+## Context
+
+With quota enforcement via entity config overrides (ADR-128), the sync Lambda writes
+a `set_limits()` call for every active (entity, resource) pair each cycle. At 2,000
+active entities with 10 resources and a 10-second sync window, this produces 20,000
+WCU per cycle — ~$486/month. Most of these writes are wasted: 80% of entities are well
+under their limits with stable allocations.
+
+The sync window already defines the worst-case over-admission bound (entities can
+double-consume for one sync window regardless of write frequency). Config writes do
+not improve the worst case; they only tighten steady-state accuracy.
+
+## Decision
+
+The sync Lambda must only write entity config overrides when one of two triggers fires:
+
+1. **Exhaustion trigger:** The entity's projected time-to-exhaustion (remaining tokens
+   divided by recent consumption rate) is less than twice the sync window. This prevents
+   the entity from running out of tokens before the next sync cycle can react.
+
+2. **Drift trigger:** The computed allocation differs from the currently configured
+   capacity by more than 15%. This corrects stale quotas for entities whose traffic
+   pattern has shifted significantly.
+
+All other entities must be skipped (no config write). Trigger evaluation must be
+computed from S3 snapshot data (ADR-127) without additional DynamoDB reads.
+
+## Consequences
+
+**Positive:**
+- Config writes drop from ~20,000 to ~600 per cycle at steady state (~97% reduction)
+- Monthly sync cost drops from ~$486 to ~$40 at 2,000 active entities
+- DynamoDB write throughput spikes are smoothed (fewer concurrent writes)
+- Worst-case over-admission is unchanged (bounded by sync window, not write frequency)
+
+**Negative:**
+- Entities with slowly drifting traffic (<15% per cycle) may have stale quotas for
+  multiple sync cycles before the drift threshold triggers
+- Exhaustion prediction depends on consumption rate estimation, which may be noisy for
+  bursty workloads
+- Two tunable parameters (exhaustion horizon = 2x sync window, drift threshold = 15%)
+  require validation under production traffic patterns
+
+## Alternatives Considered
+
+### Write every entity every cycle (no filtering)
+Rejected because: 97% of writes are redundant, costing ~$450/month in unnecessary WCU
+without improving the over-admission bound set by the sync window.
+
+### Write only on exhaustion (drop drift trigger)
+Rejected because: entities with shifting traffic patterns would keep stale allocations
+indefinitely, wasting regional quota until they approach exhaustion.
+
+### Event-driven writes via DynamoDB Streams (write on every bucket change)
+Rejected because: couples sync frequency to acquire volume rather than a fixed window,
+producing more writes than periodic polling for high-throughput entities.
diff --git a/docs/adr/130-per-region-sync-lambda.md b/docs/adr/130-per-region-sync-lambda.md
@@ -0,0 +1,56 @@
+# ADR-130: Per-Region Sync Lambda
+
+**Status:** Proposed
+**Date:** 2026-02-14
+
+## Context
+
+Cross-region sync (ADR-126, ADR-127) requires a Lambda function that reads consumption
+snapshots, computes quota allocations, and writes config overrides. Two topologies were
+evaluated:
+
+- **Single coordinator:** One Lambda in a designated region reads all snapshots, computes
+  all quotas, and writes configs to all regions. Total cross-region calls: `2(N-1)` per
+  cycle (reads + writes). Single point of failure.
+- **Per-region:** Each region runs its own Lambda that reads remote snapshots, computes
+  its own quota locally, and writes only to its local table. Total cross-region calls
+  per Lambda: `N-1` reads, 0 writes.
+
+Both topologies produce the same quota allocation when given the same inputs. The
+per-region Lambda runs a deterministic function: given the same S3 snapshots, every
+region independently computes the same allocation. No distributed consensus is needed.
+
+## Decision
+
+Each region must run its own sync Lambda, triggered by EventBridge on a fixed schedule
+(configurable sync window). Each Lambda must read its local bucket states, write its
+snapshot to S3 (ADR-127), read all remote snapshots from S3, compute quotas using a
+deterministic allocation function, and write triggered config overrides (ADR-129) to
+its local DynamoDB table only. No Lambda may write to a remote region's DynamoDB table.
+
+## Consequences
+
+**Positive:**
+- Symmetric architecture: every region deploys the same CloudFormation stack
+- No single point of failure: one region's Lambda failure does not affect other regions
+- Zero cross-region DynamoDB writes (only cross-region S3 reads, ~100ms latency)
+- Scales naturally: adding a region means deploying the same stack, no coordinator changes
+- Each region can independently tune its sync window
+
+**Negative:**
+- N Lambdas compute the same allocation independently (redundant CPU, negligible cost)
+- Slight snapshot staleness between Lambdas reading at different moments within a cycle
+  (sub-second divergence, converges on next cycle)
+- More infrastructure per region (EventBridge rule + Lambda + IAM), though identical
+  across regions and part of the standard stack deployment
+
+## Alternatives Considered
+
+### Single coordinator Lambda in a designated region
+Rejected because: introduces an asymmetric "special" region, creates a single point of
+failure for all global quota allocation, and requires cross-region DynamoDB writes for
+config overrides in remote regions.
+
+### Peer-to-peer gossip between regional Lambdas
+Rejected because: adds network coordination complexity (discovery, message ordering)
+without improving on the deterministic-computation-from-shared-S3 approach.
diff --git a/docs/adr/131-sync-config-ttl.md b/docs/adr/131-sync-config-ttl.md
@@ -0,0 +1,66 @@
+# ADR-131: TTL on Sync-Written Entity Config Records
+
+**Status:** Proposed
+**Date:** 2026-02-14
+
+## Context
+
+The sync Lambda enforces regional quotas by writing entity-level config overrides via
+`set_limits()` (ADR-128). Per ADR-119, buckets using entity custom limits persist
+indefinitely (no TTL), while buckets using resource/system defaults have TTL and
+auto-expire.
+
+When the sync Lambda writes an entity config, the bucket transitions from
+"default-limit" (has TTL) to "custom-limit" (no TTL). If the entity later goes idle
+and the sync Lambda stops writing configs (no trigger fires per ADR-129), both the
+config record and its bucket persist indefinitely. For high-churn entity populations
+(anonymous users, ephemeral API keys), this causes unbounded storage growth.
+
+The fix must not affect operator-written entity configs, which intentionally persist
+indefinitely per ADR-119.
+
+## Decision
+
+Sync-written entity config records must include a DynamoDB TTL attribute set to
+`now + 3 × sync_window`. The sync Lambda must refresh the TTL on each config write.
+
+This extends ADR-119's bucket TTL rule. The updated bucket TTL logic is:
+
+- Bucket has **no TTL** if: entity config exists **without** a `ttl` attribute
+  (operator-written, persists indefinitely — unchanged from ADR-119)
+- Bucket **has TTL** if: entity config exists **with** a `ttl` attribute
+  (sync-written, treated as default-like for TTL calculation)
+- Bucket **has TTL** if: no entity config exists
+  (using resource/system defaults — unchanged from ADR-119)
+
+When the entity goes idle (no trigger fires for 3 sync windows), the config record
+auto-expires via DynamoDB TTL. The entity reverts to resource/system defaults, and
+bucket TTL behavior per ADR-119 resumes.
+
+## Consequences
+
+**Positive:**
+- Idle entities auto-cleanup: sync config expires, bucket regains TTL, storage bounded
+- No new attributes needed: the DynamoDB `ttl` attribute already exists in the table schema
+- Self-healing: if a sync Lambda fails permanently, all its configs expire within 3 windows
+
+**Negative:**
+- Bucket TTL logic (ADR-119) must check whether the entity config has a `ttl` attribute
+  to distinguish sync-written from operator-written configs
+- Config records gain a new write pattern: conditional refresh of TTL alongside limits
+- DynamoDB TTL deletion is asynchronous (up to 48 hours), so expired configs may linger
+  in scans; queries using strong conditions are unaffected
+
+## Alternatives Considered
+
+### Explicit cleanup pass in the sync Lambda (delete stale configs)
+Rejected because: requires the sync Lambda to maintain a "previously synced" entity set
+across invocations, adding state management complexity to a stateless Lambda function.
+
+### Separate DynamoDB sort key for sync configs (#SYNC_CONFIG#{resource})
+Rejected because: adds a new config level to the resolution hierarchy (ADR-118), breaking
+the existing four-level precedence model and requiring changes to `resolve_limits()`.
+
+### No TTL on sync configs (rely on operator cleanup)
+Rejected because: operators should not need to manually clean up configs created by an
+automated sync process, especially for ephemeral entities at scale.