From e4b7a8b4f0f01254475073fcabdc3b8fec68941a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Patrick=20Sodr=C3=A9?= Date: Sun, 15 Feb 2026 00:03:06 -0500 Subject: [PATCH 1/3] =?UTF-8?q?=F0=9F=93=9D=20docs(infra):=20add=20multi-r?= =?UTF-8?q?egion=20architecture=20ADRs=20(123-129)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Seven proposed ADRs documenting the multi-region rate limiting architecture: - ADR-123: Independent regional tables (reject Global Tables) - ADR-124: S3-based cross-region sync exchange - ADR-125: Quota enforcement via entity config overrides - ADR-126: Trigger-based sync writes (exhaustion + drift) - ADR-127: Per-region sync Lambda (symmetric, no coordinator) - ADR-128: TTL on sync-written config records (extends ADR-119) - ADR-129: Sync config ownership via TTL presence Co-Authored-By: Claude Opus 4.6 --- .../123-multi-region-independent-tables.md | 58 ++++++++++++++++ docs/adr/124-s3-sync-exchange.md | 59 +++++++++++++++++ docs/adr/125-quota-enforcement-via-config.md | 63 ++++++++++++++++++ docs/adr/126-trigger-based-sync-writes.md | 61 +++++++++++++++++ docs/adr/127-per-region-sync-lambda.md | 56 ++++++++++++++++ docs/adr/128-sync-config-ttl.md | 66 +++++++++++++++++++ docs/adr/129-sync-config-ownership.md | 64 ++++++++++++++++++ 7 files changed, 427 insertions(+) create mode 100644 docs/adr/123-multi-region-independent-tables.md create mode 100644 docs/adr/124-s3-sync-exchange.md create mode 100644 docs/adr/125-quota-enforcement-via-config.md create mode 100644 docs/adr/126-trigger-based-sync-writes.md create mode 100644 docs/adr/127-per-region-sync-lambda.md create mode 100644 docs/adr/128-sync-config-ttl.md create mode 100644 docs/adr/129-sync-config-ownership.md diff --git a/docs/adr/123-multi-region-independent-tables.md b/docs/adr/123-multi-region-independent-tables.md new file mode 100644 index 00000000..d1497ee8 --- /dev/null +++ b/docs/adr/123-multi-region-independent-tables.md @@ -0,0 +1,58 @@ +# ADR-123: Multi-Region via Independent Regional Tables + +**Status:** Proposed +**Date:** 2026-02-14 + +## Context + +Users need to enforce rate limits across multiple AWS regions (e.g., an organization's +global RPM limit must be shared by clients in us-east-1 and eu-west-1). DynamoDB Global +Tables is the obvious candidate, but it has fundamental conflicts with zae-limiter's +write patterns: + +- **ADD counter loss:** Global Tables uses last-writer-wins at the item level. Concurrent + `ADD tk -consumed` from two regions results in one write overwriting the other, silently + losing consumption data and causing over-admission. +- **Transaction non-atomicity:** `TransactWriteItems` is ACID only in the originating + region. Cascade child+parent writes appear as partial updates in other regions. +- **Double refill:** Each region's aggregator processes its own stream. Replicated writes + appear in all streams, requiring filtering to avoid double-counting and double-refilling. + +The namespace feature (issue #376) already provides write isolation: each namespace has +its own partition key prefix, so records in different namespaces never collide. + +## Decision + +Multi-region must use **independent DynamoDB tables per region**, one per deployed stack. +Each region must use a dedicated namespace for its rate-limiting data. Cross-region +coordination must be handled by a periodic sync mechanism (see ADR-124, ADR-127), not by +DynamoDB replication. Global Tables must not be used for the rate-limiting table. + +## Consequences + +**Positive:** +- Write cost stays at 1x (no replicated WCU tax) +- All existing write patterns (speculative, optimistic lock, transactions) work unchanged +- Aggregator Lambda processes only local events, no stream filtering needed +- Each region is fully independent; one region's failure does not affect others + +**Negative:** +- No automatic data replication; regional data is lost if a region fails permanently +- Cross-region coordination requires a new sync component (ADR-124, ADR-127) +- Rate limiter state is ephemeral; region loss causes temporary over-admission until + sync catches up, bounded by one sync window + +## Alternatives Considered + +### DynamoDB Global Tables with namespace-per-region isolation +Rejected because: replicated WCUs double write cost, ADD operations lose data under +concurrent cross-region writes, and transactions are not atomic across regions. + +### DynamoDB Global Tables with counter sharding (per-region SET attributes) +Rejected because: requires reworking the composite bucket schema (ADR-114), breaks +speculative writes, and the 2x write cost is not justified when a sync mechanism is +needed regardless. + +### Centralized single-region table with cross-region API calls +Rejected because: adds 50-150ms latency to every acquire() call for remote-region +clients, creating a single point of failure with no local fallback. diff --git a/docs/adr/124-s3-sync-exchange.md b/docs/adr/124-s3-sync-exchange.md new file mode 100644 index 00000000..8f7a9e8e --- /dev/null +++ b/docs/adr/124-s3-sync-exchange.md @@ -0,0 +1,59 @@ +# ADR-124: S3-Based Cross-Region Sync Exchange + +**Status:** Proposed +**Date:** 2026-02-14 + +## Context + +With independent DynamoDB tables per region (ADR-123), a sync mechanism must exchange +consumption data between regions. The exchange payload is a snapshot of all active +entities' bucket states: `total_consumed_milli`, `tokens_milli`, and `capacity_milli` +per entity, resource, and limit. + +Using DynamoDB as the exchange medium means writing one item per active (entity, resource) +pair per sync cycle. At 2,000 active entities with 10 resources and a 10-second sync +window, this costs ~$3,900/month in WCU alone — roughly 3x the entire acquire budget. + +The sync payload is a batch snapshot: all active entities' state at a point in time. +This is a bulk data transfer problem, not an item-level access problem. + +## Decision + +Each region's sync Lambda must write its consumption snapshot as a single S3 object +(JSON) to a shared sync bucket, keyed by `{region}/snapshot.json`. Remote regions must +read these objects via cross-region S3 GET. DynamoDB must not be used for publishing +sync reports. + +The snapshot must include, per active (entity, resource) pair: the per-limit +`total_consumed_milli` counter, the current `tokens_milli`, and the configured +`capacity_milli`. Snapshot objects must have a TTL (S3 lifecycle) of 5 minutes. + +## Consequences + +**Positive:** +- Publishing cost drops to ~$1.30/month regardless of entity count (1 S3 PUT per cycle) +- Reading cost is ~$0.10/month per remote region (1 S3 GET per cycle) +- Snapshot size is bounded: 2,000 entities x 10 resources x 60 bytes = ~1.2 MB per PUT +- S3 is highly available and durable; no capacity planning needed + +**Negative:** +- Introduces S3 dependency for cross-region coordination (new failure mode) +- S3 eventual consistency means a GET may return a slightly stale snapshot (~1s) +- Requires a shared S3 bucket accessible from all regions (cross-region GET latency + ~100ms, acceptable for background sync) +- Snapshot format must be versioned to handle schema evolution + +## Alternatives Considered + +### DynamoDB items for sync reports (one per entity per resource) +Rejected because: WCU cost scales linearly with entity count, reaching $3,900/month +at 2,000 active entities with 10-second sync — 3x the acquire budget. + +### DynamoDB items for sync reports (one batch item per resource) +Rejected because: 400KB item size limit caps at ~4,000 entities per item, and large +item writes consume proportionally more WCUs, offering no cost advantage over +individual items. + +### SQS/SNS for event-driven sync +Rejected because: requires per-event cross-region message delivery, adding complexity +and cost proportional to acquire volume rather than sync frequency. diff --git a/docs/adr/125-quota-enforcement-via-config.md b/docs/adr/125-quota-enforcement-via-config.md new file mode 100644 index 00000000..03899f4b --- /dev/null +++ b/docs/adr/125-quota-enforcement-via-config.md @@ -0,0 +1,63 @@ +# ADR-125: Quota Enforcement via Entity Config Overrides + +**Status:** Proposed +**Date:** 2026-02-14 + +## Context + +With independent tables per region (ADR-123) and S3-based sync (ADR-124), each region's +sync Lambda computes a regional quota for each entity. This quota must be enforced by +the rate limiter's hot path without modifying the acquire flow. + +Three enforcement mechanisms were evaluated: + +1. **Entity config overrides:** Write adjusted limits via `set_limits()`, picked up by + the existing config cache on the next resolve. +2. **Shadow counter on bucket item:** Write a `remote_tc` attribute on the bucket and + add a condition to speculative writes. +3. **Direct token deduction:** `ADD b_rpm_tk -remote_delta` on the bucket item. + +The shadow counter approach has a semantic mismatch: `total_consumed_milli` is a lifetime +monotonic counter incompatible with the per-window token bucket model. Direct token +deduction does not adjust the refill ceiling — each region's bucket refills at the full +global rate, causing N regions to provide Nx the intended refill. + +## Decision + +Regional quotas must be enforced by writing entity-level config overrides via the +existing `set_limits(entity_id, resource, limits)` API. The sync Lambda must compute +the allocated capacity per entity and write it as an entity config record. The rate +limiter's existing config resolution hierarchy (Entity > Resource > System) must be +the sole mechanism for quota enforcement. + +## Consequences + +**Positive:** +- Zero changes to the acquire hot path (speculative writes, optimistic lock, bucket math) +- Uses the existing config hierarchy; no new DynamoDB schema or access patterns +- Capacity adjustment naturally controls refill ceiling via token bucket math +- Config cache TTL provides built-in staleness tolerance (already accepted in ADR-105) + +**Negative:** +- Token drain lag: if current tokens exceed the new reduced capacity, the entity can + consume excess tokens until they drain naturally (bounded by consumption rate) +- Config writes are the dominant sync cost (~$40/month at 2,000 active entities with + trigger-based filtering per ADR-126) +- Config cache TTL (default 60s) delays quota enforcement after a config write; the + sync Lambda and application use separate Repository instances + +## Alternatives Considered + +### Shadow counter attribute on bucket item (remote_tc) +Rejected because: `total_consumed_milli` is a monotonic lifetime counter that cannot +be compared against a per-window capacity limit, and modifying the speculative write +condition changes the hot path for all users. + +### Direct token deduction (ADD tk -remote_delta) +Rejected because: each region's bucket still refills at the full global rate, so N +regions produce Nx total refill — the deduction fights the bucket math without +correcting the underlying refill ceiling. + +### In-memory client-side consumption map (no DynamoDB writes) +Rejected because: requires background polling threads and in-memory state, which works +for long-running services but not for Lambda-based rate limiting. diff --git a/docs/adr/126-trigger-based-sync-writes.md b/docs/adr/126-trigger-based-sync-writes.md new file mode 100644 index 00000000..6b591154 --- /dev/null +++ b/docs/adr/126-trigger-based-sync-writes.md @@ -0,0 +1,61 @@ +# ADR-126: Trigger-Based Sync Config Writes + +**Status:** Proposed +**Date:** 2026-02-14 + +## Context + +With quota enforcement via entity config overrides (ADR-125), the sync Lambda writes +a `set_limits()` call for every active (entity, resource) pair each cycle. At 2,000 +active entities with 10 resources and a 10-second sync window, this produces 20,000 +WCU per cycle — ~$486/month. Most of these writes are wasted: 80% of entities are well +under their limits with stable allocations. + +The sync window already defines the worst-case over-admission bound (entities can +double-consume for one sync window regardless of write frequency). Config writes do +not improve the worst case; they only tighten steady-state accuracy. + +## Decision + +The sync Lambda must only write entity config overrides when one of two triggers fires: + +1. **Exhaustion trigger:** The entity's projected time-to-exhaustion (remaining tokens + divided by recent consumption rate) is less than twice the sync window. This prevents + the entity from running out of tokens before the next sync cycle can react. + +2. **Drift trigger:** The computed allocation differs from the currently configured + capacity by more than 15%. This corrects stale quotas for entities whose traffic + pattern has shifted significantly. + +All other entities must be skipped (no config write). Trigger evaluation must be +computed from S3 snapshot data (ADR-124) without additional DynamoDB reads. + +## Consequences + +**Positive:** +- Config writes drop from ~20,000 to ~600 per cycle at steady state (~97% reduction) +- Monthly sync cost drops from ~$486 to ~$40 at 2,000 active entities +- DynamoDB write throughput spikes are smoothed (fewer concurrent writes) +- Worst-case over-admission is unchanged (bounded by sync window, not write frequency) + +**Negative:** +- Entities with slowly drifting traffic (<15% per cycle) may have stale quotas for + multiple sync cycles before the drift threshold triggers +- Exhaustion prediction depends on consumption rate estimation, which may be noisy for + bursty workloads +- Two tunable parameters (exhaustion horizon = 2x sync window, drift threshold = 15%) + require validation under production traffic patterns + +## Alternatives Considered + +### Write every entity every cycle (no filtering) +Rejected because: 97% of writes are redundant, costing ~$450/month in unnecessary WCU +without improving the over-admission bound set by the sync window. + +### Write only on exhaustion (drop drift trigger) +Rejected because: entities with shifting traffic patterns would keep stale allocations +indefinitely, wasting regional quota until they approach exhaustion. + +### Event-driven writes via DynamoDB Streams (write on every bucket change) +Rejected because: couples sync frequency to acquire volume rather than a fixed window, +producing more writes than periodic polling for high-throughput entities. diff --git a/docs/adr/127-per-region-sync-lambda.md b/docs/adr/127-per-region-sync-lambda.md new file mode 100644 index 00000000..a6ce496f --- /dev/null +++ b/docs/adr/127-per-region-sync-lambda.md @@ -0,0 +1,56 @@ +# ADR-127: Per-Region Sync Lambda + +**Status:** Proposed +**Date:** 2026-02-14 + +## Context + +Cross-region sync (ADR-123, ADR-124) requires a Lambda function that reads consumption +snapshots, computes quota allocations, and writes config overrides. Two topologies were +evaluated: + +- **Single coordinator:** One Lambda in a designated region reads all snapshots, computes + all quotas, and writes configs to all regions. Total cross-region calls: `2(N-1)` per + cycle (reads + writes). Single point of failure. +- **Per-region:** Each region runs its own Lambda that reads remote snapshots, computes + its own quota locally, and writes only to its local table. Total cross-region calls + per Lambda: `N-1` reads, 0 writes. + +Both topologies produce the same quota allocation when given the same inputs. The +per-region Lambda runs a deterministic function: given the same S3 snapshots, every +region independently computes the same allocation. No distributed consensus is needed. + +## Decision + +Each region must run its own sync Lambda, triggered by EventBridge on a fixed schedule +(configurable sync window). Each Lambda must read its local bucket states, write its +snapshot to S3 (ADR-124), read all remote snapshots from S3, compute quotas using a +deterministic allocation function, and write triggered config overrides (ADR-126) to +its local DynamoDB table only. No Lambda may write to a remote region's DynamoDB table. + +## Consequences + +**Positive:** +- Symmetric architecture: every region deploys the same CloudFormation stack +- No single point of failure: one region's Lambda failure does not affect other regions +- Zero cross-region DynamoDB writes (only cross-region S3 reads, ~100ms latency) +- Scales naturally: adding a region means deploying the same stack, no coordinator changes +- Each region can independently tune its sync window + +**Negative:** +- N Lambdas compute the same allocation independently (redundant CPU, negligible cost) +- Slight snapshot staleness between Lambdas reading at different moments within a cycle + (sub-second divergence, converges on next cycle) +- More infrastructure per region (EventBridge rule + Lambda + IAM), though identical + across regions and part of the standard stack deployment + +## Alternatives Considered + +### Single coordinator Lambda in a designated region +Rejected because: introduces an asymmetric "special" region, creates a single point of +failure for all global quota allocation, and requires cross-region DynamoDB writes for +config overrides in remote regions. + +### Peer-to-peer gossip between regional Lambdas +Rejected because: adds network coordination complexity (discovery, message ordering) +without improving on the deterministic-computation-from-shared-S3 approach. diff --git a/docs/adr/128-sync-config-ttl.md b/docs/adr/128-sync-config-ttl.md new file mode 100644 index 00000000..31cbeff1 --- /dev/null +++ b/docs/adr/128-sync-config-ttl.md @@ -0,0 +1,66 @@ +# ADR-128: TTL on Sync-Written Entity Config Records + +**Status:** Proposed +**Date:** 2026-02-14 + +## Context + +The sync Lambda enforces regional quotas by writing entity-level config overrides via +`set_limits()` (ADR-125). Per ADR-119, buckets using entity custom limits persist +indefinitely (no TTL), while buckets using resource/system defaults have TTL and +auto-expire. + +When the sync Lambda writes an entity config, the bucket transitions from +"default-limit" (has TTL) to "custom-limit" (no TTL). If the entity later goes idle +and the sync Lambda stops writing configs (no trigger fires per ADR-126), both the +config record and its bucket persist indefinitely. For high-churn entity populations +(anonymous users, ephemeral API keys), this causes unbounded storage growth. + +The fix must not affect operator-written entity configs, which intentionally persist +indefinitely per ADR-119. + +## Decision + +Sync-written entity config records must include a DynamoDB TTL attribute set to +`now + 3 × sync_window`. The sync Lambda must refresh the TTL on each config write. + +This extends ADR-119's bucket TTL rule. The updated bucket TTL logic is: + +- Bucket has **no TTL** if: entity config exists **without** a `ttl` attribute + (operator-written, persists indefinitely — unchanged from ADR-119) +- Bucket **has TTL** if: entity config exists **with** a `ttl` attribute + (sync-written, treated as default-like for TTL calculation) +- Bucket **has TTL** if: no entity config exists + (using resource/system defaults — unchanged from ADR-119) + +When the entity goes idle (no trigger fires for 3 sync windows), the config record +auto-expires via DynamoDB TTL. The entity reverts to resource/system defaults, and +bucket TTL behavior per ADR-119 resumes. + +## Consequences + +**Positive:** +- Idle entities auto-cleanup: sync config expires, bucket regains TTL, storage bounded +- No new attributes needed: the DynamoDB `ttl` attribute already exists in the table schema +- Self-healing: if a sync Lambda fails permanently, all its configs expire within 3 windows + +**Negative:** +- Bucket TTL logic (ADR-119) must check whether the entity config has a `ttl` attribute + to distinguish sync-written from operator-written configs +- Config records gain a new write pattern: conditional refresh of TTL alongside limits +- DynamoDB TTL deletion is asynchronous (up to 48 hours), so expired configs may linger + in scans; queries using strong conditions are unaffected + +## Alternatives Considered + +### Explicit cleanup pass in the sync Lambda (delete stale configs) +Rejected because: requires the sync Lambda to maintain a "previously synced" entity set +across invocations, adding state management complexity to a stateless Lambda function. + +### Separate DynamoDB sort key for sync configs (#SYNC_CONFIG#{resource}) +Rejected because: adds a new config level to the resolution hierarchy (ADR-118), breaking +the existing four-level precedence model and requiring changes to `resolve_limits()`. + +### No TTL on sync configs (rely on operator cleanup) +Rejected because: operators should not need to manually clean up configs created by an +automated sync process, especially for ephemeral entities at scale. diff --git a/docs/adr/129-sync-config-ownership.md b/docs/adr/129-sync-config-ownership.md new file mode 100644 index 00000000..d426e9d8 --- /dev/null +++ b/docs/adr/129-sync-config-ownership.md @@ -0,0 +1,64 @@ +# ADR-129: Sync Config Ownership via TTL Presence + +**Status:** Proposed +**Date:** 2026-02-14 + +## Context + +Both operators and the sync Lambda write entity-level config records to the same +DynamoDB item (`PK=ENTITY#{id}, SK=#CONFIG#{resource}`) via `set_limits()`. Without +an ownership mechanism, the sync Lambda overwrites operator-set limits on its next +cycle, and operators overwrite sync-computed quotas on manual updates. + +The sync Lambda writes configs with a TTL attribute (ADR-128). Operator-written configs +have no TTL (they persist indefinitely per ADR-119). This difference in TTL presence +is a natural discriminator for config ownership. + +## Decision + +The sync Lambda must only write to an entity config record when the record is absent or +the existing record has a `ttl` attribute (indicating it was previously sync-written). +The sync Lambda's write must use the condition +`attribute_not_exists(PK) OR attribute_exists(ttl)`. This ownership check must be +implemented in the Repository layer (not in RateLimiter or the sync Lambda itself), +consistent with ADR-122's requirement that data access logic lives in the repository. + +Operator-written configs (no `ttl` attribute) must never be overwritten by the sync +Lambda. When an operator writes entity config via `set_limits()`, the record must not +include a `ttl` attribute, signaling operator ownership. The sync Lambda must skip +that entity for all subsequent cycles. + +To return an entity to sync-managed quotas, the operator must delete the entity config +via `delete_limits()`. The sync Lambda will then recreate it with TTL on the next +triggered cycle (ADR-126). + +## Consequences + +**Positive:** +- Operator configs always win: manual overrides are never clobbered by automated sync +- No new attributes: TTL presence is a sufficient ownership discriminator +- Reversible: `delete_limits()` returns the entity to sync management +- Condition check costs 0 extra RCU (evaluated server-side in the UpdateItem condition) + +**Negative:** +- Operators must delete (not overwrite) entity configs to return to sync management; + overwriting with `set_limits()` produces a record without TTL, taking operator ownership +- The sync Lambda's `ConditionalCheckFailedException` for operator-owned entities is + silent (expected), but increases CloudWatch error metrics unless filtered +- If an operator accidentally creates entity config, the entity silently leaves sync + management with no warning; observability tooling must surface this + +## Alternatives Considered + +### Explicit `origin` attribute ("sync" vs "operator") on config records +Rejected because: adds a new attribute that must be threaded through all config read/write +paths, requires migration for existing records, and provides no benefit over the TTL +presence check that ADR-128 already establishes. + +### Sync Lambda always wins (overwrite operator configs) +Rejected because: operators set entity limits for business reasons (premium tiers, custom +SLAs); automated sync should not override intentional business decisions. + +### Separate config level for sync (Entity > Sync > Resource) +Rejected because: breaks the four-level config hierarchy (ADR-118) and requires changes +to `resolve_limits()` in every backend implementation. From db3c9caec004f978e2db3f8abc7350af66ebee4a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Patrick=20Sodr=C3=A9?= Date: Thu, 11 Jun 2026 21:21:42 -0400 Subject: [PATCH 2/3] =?UTF-8?q?=F0=9F=93=9D=20docs(adr):=20renumber=20mult?= =?UTF-8?q?i-region=20ADRs=20123-129=20=E2=86=92=20126-132?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ADR-123 was assigned to local-secondary-indexes on main after this branch was created, so the multi-region series collided starting at 123. Shift the whole series up by 3 (123→126 … 129→132) to the next free block, updating titles and all internal cross-references. References to existing ADRs (105, 114, 117, 118, 119, 122) are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...t-tables.md => 126-multi-region-independent-tables.md} | 6 +++--- .../{124-s3-sync-exchange.md => 127-s3-sync-exchange.md} | 4 ++-- ...-via-config.md => 128-quota-enforcement-via-config.md} | 6 +++--- ...ed-sync-writes.md => 129-trigger-based-sync-writes.md} | 6 +++--- ...egion-sync-lambda.md => 130-per-region-sync-lambda.md} | 8 ++++---- .../{128-sync-config-ttl.md => 131-sync-config-ttl.md} | 6 +++--- ...c-config-ownership.md => 132-sync-config-ownership.md} | 8 ++++---- 7 files changed, 22 insertions(+), 22 deletions(-) rename docs/adr/{123-multi-region-independent-tables.md => 126-multi-region-independent-tables.md} (96%) rename docs/adr/{124-s3-sync-exchange.md => 127-s3-sync-exchange.md} (95%) rename docs/adr/{125-quota-enforcement-via-config.md => 128-quota-enforcement-via-config.md} (93%) rename docs/adr/{126-trigger-based-sync-writes.md => 129-trigger-based-sync-writes.md} (94%) rename docs/adr/{127-per-region-sync-lambda.md => 130-per-region-sync-lambda.md} (93%) rename docs/adr/{128-sync-config-ttl.md => 131-sync-config-ttl.md} (95%) rename docs/adr/{129-sync-config-ownership.md => 132-sync-config-ownership.md} (94%) diff --git a/docs/adr/123-multi-region-independent-tables.md b/docs/adr/126-multi-region-independent-tables.md similarity index 96% rename from docs/adr/123-multi-region-independent-tables.md rename to docs/adr/126-multi-region-independent-tables.md index d1497ee8..d1135d09 100644 --- a/docs/adr/123-multi-region-independent-tables.md +++ b/docs/adr/126-multi-region-independent-tables.md @@ -1,4 +1,4 @@ -# ADR-123: Multi-Region via Independent Regional Tables +# ADR-126: Multi-Region via Independent Regional Tables **Status:** Proposed **Date:** 2026-02-14 @@ -25,7 +25,7 @@ its own partition key prefix, so records in different namespaces never collide. Multi-region must use **independent DynamoDB tables per region**, one per deployed stack. Each region must use a dedicated namespace for its rate-limiting data. Cross-region -coordination must be handled by a periodic sync mechanism (see ADR-124, ADR-127), not by +coordination must be handled by a periodic sync mechanism (see ADR-127, ADR-130), not by DynamoDB replication. Global Tables must not be used for the rate-limiting table. ## Consequences @@ -38,7 +38,7 @@ DynamoDB replication. Global Tables must not be used for the rate-limiting table **Negative:** - No automatic data replication; regional data is lost if a region fails permanently -- Cross-region coordination requires a new sync component (ADR-124, ADR-127) +- Cross-region coordination requires a new sync component (ADR-127, ADR-130) - Rate limiter state is ephemeral; region loss causes temporary over-admission until sync catches up, bounded by one sync window diff --git a/docs/adr/124-s3-sync-exchange.md b/docs/adr/127-s3-sync-exchange.md similarity index 95% rename from docs/adr/124-s3-sync-exchange.md rename to docs/adr/127-s3-sync-exchange.md index 8f7a9e8e..6d190974 100644 --- a/docs/adr/124-s3-sync-exchange.md +++ b/docs/adr/127-s3-sync-exchange.md @@ -1,11 +1,11 @@ -# ADR-124: S3-Based Cross-Region Sync Exchange +# ADR-127: S3-Based Cross-Region Sync Exchange **Status:** Proposed **Date:** 2026-02-14 ## Context -With independent DynamoDB tables per region (ADR-123), a sync mechanism must exchange +With independent DynamoDB tables per region (ADR-126), a sync mechanism must exchange consumption data between regions. The exchange payload is a snapshot of all active entities' bucket states: `total_consumed_milli`, `tokens_milli`, and `capacity_milli` per entity, resource, and limit. diff --git a/docs/adr/125-quota-enforcement-via-config.md b/docs/adr/128-quota-enforcement-via-config.md similarity index 93% rename from docs/adr/125-quota-enforcement-via-config.md rename to docs/adr/128-quota-enforcement-via-config.md index 03899f4b..0254597a 100644 --- a/docs/adr/125-quota-enforcement-via-config.md +++ b/docs/adr/128-quota-enforcement-via-config.md @@ -1,11 +1,11 @@ -# ADR-125: Quota Enforcement via Entity Config Overrides +# ADR-128: Quota Enforcement via Entity Config Overrides **Status:** Proposed **Date:** 2026-02-14 ## Context -With independent tables per region (ADR-123) and S3-based sync (ADR-124), each region's +With independent tables per region (ADR-126) and S3-based sync (ADR-127), each region's sync Lambda computes a regional quota for each entity. This quota must be enforced by the rate limiter's hot path without modifying the acquire flow. @@ -42,7 +42,7 @@ the sole mechanism for quota enforcement. - Token drain lag: if current tokens exceed the new reduced capacity, the entity can consume excess tokens until they drain naturally (bounded by consumption rate) - Config writes are the dominant sync cost (~$40/month at 2,000 active entities with - trigger-based filtering per ADR-126) + trigger-based filtering per ADR-129) - Config cache TTL (default 60s) delays quota enforcement after a config write; the sync Lambda and application use separate Repository instances diff --git a/docs/adr/126-trigger-based-sync-writes.md b/docs/adr/129-trigger-based-sync-writes.md similarity index 94% rename from docs/adr/126-trigger-based-sync-writes.md rename to docs/adr/129-trigger-based-sync-writes.md index 6b591154..89f1afb3 100644 --- a/docs/adr/126-trigger-based-sync-writes.md +++ b/docs/adr/129-trigger-based-sync-writes.md @@ -1,11 +1,11 @@ -# ADR-126: Trigger-Based Sync Config Writes +# ADR-129: Trigger-Based Sync Config Writes **Status:** Proposed **Date:** 2026-02-14 ## Context -With quota enforcement via entity config overrides (ADR-125), the sync Lambda writes +With quota enforcement via entity config overrides (ADR-128), the sync Lambda writes a `set_limits()` call for every active (entity, resource) pair each cycle. At 2,000 active entities with 10 resources and a 10-second sync window, this produces 20,000 WCU per cycle — ~$486/month. Most of these writes are wasted: 80% of entities are well @@ -28,7 +28,7 @@ The sync Lambda must only write entity config overrides when one of two triggers pattern has shifted significantly. All other entities must be skipped (no config write). Trigger evaluation must be -computed from S3 snapshot data (ADR-124) without additional DynamoDB reads. +computed from S3 snapshot data (ADR-127) without additional DynamoDB reads. ## Consequences diff --git a/docs/adr/127-per-region-sync-lambda.md b/docs/adr/130-per-region-sync-lambda.md similarity index 93% rename from docs/adr/127-per-region-sync-lambda.md rename to docs/adr/130-per-region-sync-lambda.md index a6ce496f..fe6ba184 100644 --- a/docs/adr/127-per-region-sync-lambda.md +++ b/docs/adr/130-per-region-sync-lambda.md @@ -1,11 +1,11 @@ -# ADR-127: Per-Region Sync Lambda +# ADR-130: Per-Region Sync Lambda **Status:** Proposed **Date:** 2026-02-14 ## Context -Cross-region sync (ADR-123, ADR-124) requires a Lambda function that reads consumption +Cross-region sync (ADR-126, ADR-127) requires a Lambda function that reads consumption snapshots, computes quota allocations, and writes config overrides. Two topologies were evaluated: @@ -24,8 +24,8 @@ region independently computes the same allocation. No distributed consensus is n Each region must run its own sync Lambda, triggered by EventBridge on a fixed schedule (configurable sync window). Each Lambda must read its local bucket states, write its -snapshot to S3 (ADR-124), read all remote snapshots from S3, compute quotas using a -deterministic allocation function, and write triggered config overrides (ADR-126) to +snapshot to S3 (ADR-127), read all remote snapshots from S3, compute quotas using a +deterministic allocation function, and write triggered config overrides (ADR-129) to its local DynamoDB table only. No Lambda may write to a remote region's DynamoDB table. ## Consequences diff --git a/docs/adr/128-sync-config-ttl.md b/docs/adr/131-sync-config-ttl.md similarity index 95% rename from docs/adr/128-sync-config-ttl.md rename to docs/adr/131-sync-config-ttl.md index 31cbeff1..cc813896 100644 --- a/docs/adr/128-sync-config-ttl.md +++ b/docs/adr/131-sync-config-ttl.md @@ -1,4 +1,4 @@ -# ADR-128: TTL on Sync-Written Entity Config Records +# ADR-131: TTL on Sync-Written Entity Config Records **Status:** Proposed **Date:** 2026-02-14 @@ -6,13 +6,13 @@ ## Context The sync Lambda enforces regional quotas by writing entity-level config overrides via -`set_limits()` (ADR-125). Per ADR-119, buckets using entity custom limits persist +`set_limits()` (ADR-128). Per ADR-119, buckets using entity custom limits persist indefinitely (no TTL), while buckets using resource/system defaults have TTL and auto-expire. When the sync Lambda writes an entity config, the bucket transitions from "default-limit" (has TTL) to "custom-limit" (no TTL). If the entity later goes idle -and the sync Lambda stops writing configs (no trigger fires per ADR-126), both the +and the sync Lambda stops writing configs (no trigger fires per ADR-129), both the config record and its bucket persist indefinitely. For high-churn entity populations (anonymous users, ephemeral API keys), this causes unbounded storage growth. diff --git a/docs/adr/129-sync-config-ownership.md b/docs/adr/132-sync-config-ownership.md similarity index 94% rename from docs/adr/129-sync-config-ownership.md rename to docs/adr/132-sync-config-ownership.md index d426e9d8..452b7a17 100644 --- a/docs/adr/129-sync-config-ownership.md +++ b/docs/adr/132-sync-config-ownership.md @@ -1,4 +1,4 @@ -# ADR-129: Sync Config Ownership via TTL Presence +# ADR-132: Sync Config Ownership via TTL Presence **Status:** Proposed **Date:** 2026-02-14 @@ -10,7 +10,7 @@ DynamoDB item (`PK=ENTITY#{id}, SK=#CONFIG#{resource}`) via `set_limits()`. With an ownership mechanism, the sync Lambda overwrites operator-set limits on its next cycle, and operators overwrite sync-computed quotas on manual updates. -The sync Lambda writes configs with a TTL attribute (ADR-128). Operator-written configs +The sync Lambda writes configs with a TTL attribute (ADR-131). Operator-written configs have no TTL (they persist indefinitely per ADR-119). This difference in TTL presence is a natural discriminator for config ownership. @@ -30,7 +30,7 @@ that entity for all subsequent cycles. To return an entity to sync-managed quotas, the operator must delete the entity config via `delete_limits()`. The sync Lambda will then recreate it with TTL on the next -triggered cycle (ADR-126). +triggered cycle (ADR-129). ## Consequences @@ -53,7 +53,7 @@ triggered cycle (ADR-126). ### Explicit `origin` attribute ("sync" vs "operator") on config records Rejected because: adds a new attribute that must be threaded through all config read/write paths, requires migration for existing records, and provides no benefit over the TTL -presence check that ADR-128 already establishes. +presence check that ADR-131 already establishes. ### Sync Lambda always wins (overwrite operator configs) Rejected because: operators set entity limits for business reasons (premium tiers, custom From 06653b256ad5ebbba883712346936f925d6939ad Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Patrick=20Sodr=C3=A9?= Date: Thu, 11 Jun 2026 21:25:41 -0400 Subject: [PATCH 3/3] =?UTF-8?q?=E2=99=BB=EF=B8=8F=20refactor(adr):=20resol?= =?UTF-8?q?ve=20inherited=20duplicate=20ADR-121=20(policy-rename=20?= =?UTF-8?q?=E2=86=92=20124)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This branch forked from main after both ADR-121 files existed, so it carried the same duplicate-121 collision. Apply the identical fix used on main (#422): rename policy-rename-clarity 121 → 124 so the branch is self-consistent and merges cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...21-policy-rename-clarity.md => 124-policy-rename-clarity.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename docs/adr/{121-policy-rename-clarity.md => 124-policy-rename-clarity.md} (98%) diff --git a/docs/adr/121-policy-rename-clarity.md b/docs/adr/124-policy-rename-clarity.md similarity index 98% rename from docs/adr/121-policy-rename-clarity.md rename to docs/adr/124-policy-rename-clarity.md index 1f4f6482..1c8d077a 100644 --- a/docs/adr/121-policy-rename-clarity.md +++ b/docs/adr/124-policy-rename-clarity.md @@ -1,4 +1,4 @@ -# ADR-121: Rename IAM Policies for Clarity +# ADR-124: Rename IAM Policies for Clarity **Status:** Accepted **Date:** 2026-02-02