Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .kiro/specs/distributed-rate-limiting/.config.kiro
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"specId": "6c84e8d5-a68c-431b-90ee-558702173e54", "workflowType": "requirements-first", "specType": "feature"}
367 changes: 367 additions & 0 deletions .kiro/specs/distributed-rate-limiting/design.md

Large diffs are not rendered by default.

111 changes: 111 additions & 0 deletions .kiro/specs/distributed-rate-limiting/requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Requirements Document

## Introduction

The current rate limiter in `src/lib/ratelimit.ts` uses a module-level in-memory `Map` as its counter store. In a multi-instance deployment (e.g., multiple Next.js server pods or Vercel edge nodes), each instance maintains independent counters, allowing a client to bypass the AUTH rate limit (5 req/min) by spreading requests across instances.

This feature replaces the single-instance in-memory store with a shared [Upstash Redis](https://upstash.com/) backend in production, while preserving the existing in-memory behavior for development and test environments. All 20+ `withRateLimit()` call sites must continue to work without modification — the change is a pure internal implementation swap.

## Glossary

- **RateLimiter**: The abstraction that performs counter reads and writes. Either `InMemoryRateLimiter` (dev/test) or `UpstashRateLimiter` (production).
- **InMemoryRateLimiter**: The existing `Map`-backed sliding window implementation used in development and test.
- **UpstashRateLimiter**: A Redis-backed sliding window implementation that uses Upstash REST API calls to maintain shared counters.
- **RateLimit_Store**: The backing storage for rate-limit counters (either the in-memory `Map` or the Upstash Redis instance).
- **Sliding_Window**: The rate-limiting algorithm that counts requests within a rolling time window and resets the counter after `windowMs` milliseconds.
- **Tier**: A named rate-limit configuration (`AUTH`, `WRITE`, `READ`) with a `limit` and `windowMs`.
- **Identifier**: A string key combining client IP and tier (e.g., `"1.2.3.4:AUTH"`) used to namespace counters in the RateLimit_Store.
- **UPSTASH_REDIS_REST_URL**: Environment variable holding the Upstash Redis REST endpoint URL.
- **UPSTASH_REDIS_REST_TOKEN**: Environment variable holding the Upstash Redis REST API token.
- **NODE_ENV**: Node.js environment variable (`"production"`, `"development"`, or `"test"`).

---

## Requirements

### Requirement 1: Shared Counter Backend in Production

**User Story:** As a platform operator, I want rate-limit counters to be stored in a shared Redis backend in production, so that rate limits are enforced consistently across all server instances and edge nodes.

#### Acceptance Criteria

1. WHEN `NODE_ENV` is `"production"` AND `UPSTASH_REDIS_REST_URL` and `UPSTASH_REDIS_REST_TOKEN` are set, THE `RateLimiter` SHALL use the `UpstashRateLimiter` as its `RateLimit_Store`.
2. WHEN the `UpstashRateLimiter` is active, THE `RateLimiter` SHALL increment and read counters via the Upstash Redis REST API for every call to `slidingWindowRateLimit`.
3. WHEN two separate server instances call `slidingWindowRateLimit` with the same `Identifier`, THE `UpstashRateLimiter` SHALL reflect a combined request count that is the sum of both instances' requests within the same `Sliding_Window`.
4. WHEN a client's combined request count across all instances reaches the configured `limit` for a `Tier`, THE `RateLimiter` SHALL return `success: false` for subsequent requests within the same `Sliding_Window`.

---

### Requirement 2: In-Memory Fallback for Development and Test

**User Story:** As a developer, I want the rate limiter to work without Redis configured in development and test, so that local development and CI test runs are not blocked by external service dependencies.

#### Acceptance Criteria

1. WHEN `NODE_ENV` is `"development"` OR `NODE_ENV` is `"test"`, THE `RateLimiter` SHALL use the `InMemoryRateLimiter` as its `RateLimit_Store`, regardless of whether Upstash environment variables are set.
2. WHEN `UPSTASH_REDIS_REST_URL` or `UPSTASH_REDIS_REST_TOKEN` is absent AND `NODE_ENV` is `"production"`, THE `RateLimiter` SHALL fall back to the `InMemoryRateLimiter` and log a warning at startup indicating that distributed rate limiting is disabled.
3. WHILE the `InMemoryRateLimiter` is active, THE `RateLimiter` SHALL preserve all existing `Sliding_Window` semantics: allowing requests up to `limit`, blocking further requests until `windowMs` elapses, and returning correct `remaining` and `reset` values. THE `remaining` value SHALL reflect `limit - count` and MAY exceed the configured `limit` in edge cases where the `limit` is dynamically reduced after counters are initialized.

---

### Requirement 3: Backward-Compatible Public API

**User Story:** As a developer maintaining route handlers, I want the `withRateLimit()` function signature and return shape to remain unchanged, so that no call sites require modification after the refactor.

#### Acceptance Criteria

1. THE `RateLimiter` SHALL export `withRateLimit(request, tier)` with the same signature: accepting a `Request` and a `RateLimitTier`, returning `{ addHeaders, rateLimitResponse }`.
2. THE `RateLimiter` SHALL export `slidingWindowRateLimit(identifier, config)` with the same signature: accepting a `string` and `RateLimitConfig`, returning `RateLimitResult`.
3. THE `RateLimiter` SHALL export `RATE_LIMIT_TIERS`, `RateLimitTier`, `RateLimitConfig`, `RateLimitResult`, `RateLimitInfo`, and `createRateLimitResponse` with unchanged types and values.
4. WHEN `withRateLimit` is called from any existing route handler, THE `RateLimiter` SHALL return a `rateLimitResponse` of `null` for allowed requests and a `NextResponse` with status `429` for blocked requests, identical to the current behavior. THE `RateLimiter` SHALL never return a non-null `rateLimitResponse` for a request that was allowed by the `RateLimit_Store`.

---

### Requirement 4: Sliding Window Algorithm Correctness

**User Story:** As a platform operator, I want the sliding window algorithm to behave identically whether backed by in-memory or Upstash storage, so that rate-limit decisions are consistent and predictable.

#### Acceptance Criteria

1. WHEN a new `Identifier` is first seen within a `Sliding_Window`, THE `RateLimiter` SHALL initialize a counter at `1` and set `resetAt` to `now + windowMs`.
2. WHEN an existing `Identifier`'s counter is below `limit`, THE `RateLimiter` SHALL increment the counter by `1` and return `success: true` with `remaining` equal to `limit - count`.
3. WHEN an existing `Identifier`'s counter equals or exceeds `limit`, THE `RateLimiter` SHALL return `success: false` with `remaining` equal to `0` and `retryAfter` equal to the ceiling of `(resetAt - now) / 1000` seconds.
4. WHEN the current time exceeds `resetAt` for an `Identifier`, THE `RateLimiter` SHALL reset the counter to `1` and set a new `resetAt` to `now + windowMs`, treating the request as the first in a new window.
5. FOR ALL valid `(identifier, config)` pairs, the sequence `[allow × limit, block × 1, advance time by windowMs, allow × 1]` SHALL hold for both `InMemoryRateLimiter` and `UpstashRateLimiter` implementations.

---

### Requirement 5: AUTH Tier Bypass Prevention

**User Story:** As a security engineer, I want the AUTH rate limit (5 req/min) to be enforced globally, so that an attacker cannot bypass it by distributing requests across multiple server instances.

#### Acceptance Criteria

1. WHEN `NODE_ENV` is `"production"` AND Upstash credentials are configured, THE `UpstashRateLimiter` SHALL enforce the `AUTH` tier limit of `5` requests per `60,000 ms` window as a single shared counter across all instances.
2. WHEN the shared request count across all instances reaches exactly the `AUTH` tier `limit` of `5` within the same `Sliding_Window`, THE `UpstashRateLimiter` SHALL block the request that reaches the limit and all subsequent requests with `success: false`, without permitting an additional request beyond the limit.
3. IF the Upstash REST API call fails during an AUTH tier check, THEN THE `RateLimiter` SHALL fail open (allow the request) and log the error, to prevent service disruption due to Redis unavailability.

---

### Requirement 6: Upstash Package and Environment Configuration

**User Story:** As a developer onboarding to the project, I want all required packages installed and environment variables documented, so that I can configure distributed rate limiting without guessing at dependencies or config keys.

#### Acceptance Criteria

1. THE project SHALL declare `@upstash/ratelimit` and `@upstash/redis` as production dependencies in `package.json`.
2. THE `.env.example` file SHALL document `UPSTASH_REDIS_REST_URL` and `UPSTASH_REDIS_REST_TOKEN` with placeholder values and a comment explaining their purpose.
3. WHEN `UPSTASH_REDIS_REST_URL` or `UPSTASH_REDIS_REST_TOKEN` contains an invalid value at startup in production, THE `RateLimiter` SHALL log a descriptive error, fall back to `InMemoryRateLimiter` immediately, and prevent any attempt to initialize the `UpstashRateLimiter` for the remainder of the process lifetime.

---

### Requirement 7: Existing Tests Pass with In-Memory Backend

**User Story:** As a developer, I want the existing rate-limit test suite in `src/app/api/tutorials/__tests__/ratelimit.test.ts` to continue passing after the refactor, so that the behavioral contract of `slidingWindowRateLimit` and `withRateLimit` is preserved.

#### Acceptance Criteria

1. WHEN the test suite runs with `NODE_ENV` set to `"test"`, THE `RateLimiter` SHALL use the `InMemoryRateLimiter` so that no Upstash credentials or network calls are required.
2. THE `InMemoryRateLimiter` SHALL pass all existing test cases for `slidingWindowRateLimit`, including: allowing requests within the limit, blocking requests that exceed the limit, resetting the window after `windowMs` elapses, and returning correct `remaining` counts.
3. THE `InMemoryRateLimiter` SHALL pass all existing test cases for `withRateLimit`, including: `READ` and `WRITE` tier enforcement, independent tracking per IP and tier, and `addHeaders` injecting correct `X-RateLimit-*` headers.
4. WHEN `vi.useFakeTimers()` is active in a test, THE `InMemoryRateLimiter` SHOULD respect the mocked `Date.now()` value so that time-advance tests (`vi.advanceTimersByTime`) behave as expected. IF the limiter uses real time, time-advance tests SHALL still pass by relying on sufficiently long real time elapsed or by configuring real window durations.
122 changes: 122 additions & 0 deletions .kiro/specs/distributed-rate-limiting/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Implementation Plan

## Overview

Refactor `src/lib/ratelimit.ts` to support a shared Upstash Redis backend in production while keeping the existing in-memory behaviour in dev/test. The work proceeds in eight ordered steps: install dependencies, document configuration, extract the in-memory store, add the Upstash store, wire up the factory and the new async function, migrate all 25 call sites, verify existing tests, and add new distributed tests.

## Tasks

- [ ] 1. Install Upstash packages
Add `@upstash/ratelimit@^1.2.0` and `@upstash/redis@^1.34.0` to the `dependencies` section of `package.json`, then run `pnpm install` to lock the versions. These packages provide the SlidingWindow algorithm and the HTTP-based Redis client needed by `UpstashStore`.
**Acceptance**: `pnpm install` completes without errors; both packages appear in `package.json` dependencies and in `pnpm-lock.yaml` at the pinned minor versions.
**Files**: `package.json`

- [ ] 2. Document env vars in .env.example
Add a `# Rate Limiting (Distributed)` section near the bottom of `.env.example` with two entries: `UPSTASH_REDIS_REST_URL=https://<your-upstash-endpoint>.upstash.io` and `UPSTASH_REDIS_REST_TOKEN=<your-upstash-token>`, each preceded by a comment explaining its purpose. Do not add real credentials.
**Acceptance**: `.env.example` contains the new section with both placeholder vars and explanatory comments; existing entries are unchanged.
**Files**: `.env.example`

- [ ] 3. Define IRateLimiterStore interface and InMemoryStore
In `src/lib/ratelimit.ts`, introduce the `IRateLimiterStore` interface with a single method `check(identifier: string, config: RateLimitConfig): Promise<RateLimitResult>`. Extract the existing per-IP `Map` tracking logic into a new `InMemoryStore` class that implements this interface (the store remains synchronous internally but wraps results in `Promise.resolve`). Keep all existing exports — `withRateLimit`, `getClientIP`, `createRateLimitResponse`, `RATE_LIMIT_TIERS` — fully intact and behaviourally identical. Export `resetStoreForTesting()` which clears the in-memory map; this replaces any ad-hoc reset pattern in tests.
**Acceptance**: `pnpm test -- src/app/api/tutorials/__tests__/ratelimit.test.ts` passes without modification; TypeScript compiles with no new errors; `InMemoryStore` and `IRateLimiterStore` are exported from the module.
**Files**: `src/lib/ratelimit.ts`

- [ ] 4. Implement UpstashStore
Add an `UpstashStore` class to `src/lib/ratelimit.ts` that implements `IRateLimiterStore`. Internally it creates a `@upstash/redis` `Redis` client from `UPSTASH_REDIS_REST_URL` and `UPSTASH_REDIS_REST_TOKEN`. It caches one `@upstash/ratelimit` `Ratelimit` instance per `(limit, windowMs)` pair using a `Map<string, Ratelimit>` keyed on `"${limit}:${windowMs}"`, so instances are reused across requests. The `check()` method calls `ratelimit.limit(identifier)`, maps the result to `RateLimitResult`, and wraps the entire call in a `try/catch`: on any error it calls `console.error` with the error and returns `{ success: true, limit: config.limit, remaining: config.limit, reset: Date.now() + config.windowMs }` (fail-open).
**Acceptance**: Unit tests (added in task 8) confirm that a simulated Redis error causes `check()` to return `success: true`; TypeScript compiles cleanly.
**Files**: `src/lib/ratelimit.ts`

- [ ] 5. Implement createStore() factory and withRateLimitAsync()
Add a `createStore()` function that runs at module initialisation (not per-request). It reads `process.env.NODE_ENV`, `process.env.UPSTASH_REDIS_REST_URL`, and `process.env.UPSTASH_REDIS_REST_TOKEN`. If `NODE_ENV === 'production'` and both env vars are present, it returns a new `UpstashStore`; otherwise it returns a new `InMemoryStore`. If `NODE_ENV === 'production'` but either env var is missing, it also emits `console.warn('Distributed rate limiting is disabled: UPSTASH_REDIS_REST_URL or UPSTASH_REDIS_REST_TOKEN is not set. Falling back to in-memory store.')` before returning `InMemoryStore`. Assign the result to a module-level `const activeStore`. Then add `export async function withRateLimitAsync(request: Request, tier: RateLimitTier)` which resolves the client IP, calls `activeStore.check(ip, RATE_LIMIT_TIERS[tier])`, and returns the same `{ addHeaders, rateLimitResponse }` shape as `withRateLimit`.
**Acceptance**: Calling `withRateLimitAsync` in a test with `NODE_ENV=test` returns an `InMemoryStore`-backed result; TypeScript compiles cleanly; the `console.warn` fires when env vars are absent in production mode.
**Files**: `src/lib/ratelimit.ts`

- [ ] 6. Migrate route call sites to withRateLimitAsync()
Update every route file listed below to: (1) add `withRateLimitAsync` to the existing `@/lib/ratelimit` import, (2) replace each `withRateLimit(request, tier)` call with `await withRateLimitAsync(request, tier)`, and (3) ensure the handler function is `async` (most already are). The synchronous `withRateLimit` import may be removed from each file once all call sites in that file are migrated.

Files to migrate:

- `src/app/api/user/progress/route.ts`
- `src/app/api/user/settings/route.ts`
- `src/app/api/video-analytics/route.ts`
- `src/app/api/admin/feature-flags/[id]/route.ts`
- `src/app/api/admin/feature-flags/route.ts`
- `src/app/api/admin/feature-flags/evaluate/route.ts`
- `src/app/api/admin/feature-flags/audit/route.ts`
- `src/app/api/admin/audit/route.ts`
- `src/app/api/tutorials/route.ts`
- `src/app/api/tutorials/[id]/route.ts`
- `src/app/api/tutorials/[id]/progress/route.ts`
- `src/app/api/referral/validate/route.ts`
- `src/app/api/v1/tickets/route.ts`
- `src/app/api/v1/tickets/[id]/route.ts`
- `src/app/api/v1/consent/route.ts`
- `src/app/api/notes/route.ts`
- `src/app/api/bookmarks/route.ts`
- `src/app/api/courses/route.ts`
- `src/app/api/courses/[id]/route.ts`
- `src/app/api/courses/[id]/lessons/route.ts`
- `src/app/api/lessons/[id]/progress/route.ts`
- `src/app/api/auth/discord/route.ts`
- `src/app/api/auth/discord/callback/route.ts`
- `src/app/api/auth/login/route.ts`
- `src/app/api/auth/signup/route.ts`

**Acceptance**: TypeScript compiles with no errors across all migrated files; no remaining `withRateLimit(` call (without the `Async` suffix) exists in any of the above files; `pnpm build` succeeds.
**Files**: All 25 route files listed above.

- [ ] 7. Verify existing tests pass
Run the existing rate limit test suite without modifying any test file. This confirms that the `withRateLimit` synchronous path and `InMemoryStore` extraction did not regress any existing behaviour.
**Acceptance**: `pnpm test -- src/app/api/tutorials/__tests__/ratelimit.test.ts` exits with code 0 and all test cases pass.
**Files**: _(no changes — read-only verification step)_

- [ ] 8. Add tests for withRateLimitAsync and UpstashStore
Create `src/lib/__tests__/ratelimit-distributed.test.ts` with the following test cases:
1. **InMemoryStore fallback in dev/test** — with `NODE_ENV=test`, `withRateLimitAsync` resolves without error and returns `rateLimitResponse: null` under the limit.
1. **UpstashStore used in production** — mock `process.env.NODE_ENV='production'` and both Upstash env vars; spy on `UpstashStore.prototype.check` and verify it is called by `withRateLimitAsync`.
1. **Fail-open on Redis error** — mock `UpstashStore.prototype.check` to throw; verify `withRateLimitAsync` still returns `rateLimitResponse: null` (request is allowed through).
1. **console.warn on missing prod env vars** — set `NODE_ENV=production` and omit the Upstash vars; spy on `console.warn` and verify the warning message is emitted at module init.
1. **resetStoreForTesting() clears state** — exhaust the in-memory limit for an IP, call `resetStoreForTesting()`, then confirm the next request is allowed.
**Acceptance**: `pnpm test -- src/lib/__tests__/ratelimit-distributed.test.ts` exits with code 0 and all five test cases pass.
**Files**: `src/lib/__tests__/ratelimit-distributed.test.ts`

## Task Dependency Graph

```json
{
"waves": [
{
"wave": 1,
"tasks": ["1. Install Upstash packages", "2. Document env vars in .env.example"]
},
{
"wave": 2,
"tasks": ["3. Define IRateLimiterStore interface and InMemoryStore"]
},
{
"wave": 3,
"tasks": ["4. Implement UpstashStore"]
},
{
"wave": 4,
"tasks": ["5. Implement createStore() factory and withRateLimitAsync()"]
},
{
"wave": 5,
"tasks": [
"6. Migrate route call sites to withRateLimitAsync()",
"7. Verify existing tests pass",
"8. Add tests for withRateLimitAsync and UpstashStore"
]
}
]
}
```

## Notes

- `withRateLimit()` must remain synchronous and unchanged throughout — it is the safety net for any call site not yet migrated and is relied upon by existing tests.
- The `UpstashStore` is never instantiated in test or dev environments; the `createStore()` factory guarantees this. Tests that need to exercise `UpstashStore` directly must mock it.
- Task 6 lists 25 files discovered via grep at spec-authoring time. Run `grep -r 'withRateLimit(' src/app/api --include='*.ts' -l` before starting that task to catch any new files added since.
- The fail-open contract (`success: true` on any Redis error) is a deliberate trade-off: availability over strict rate enforcement. Document this in a code comment above `UpstashStore.check()`.
- `resetStoreForTesting()` should only be called from test code. Consider adding a runtime guard (`if (process.env.NODE_ENV === 'production') throw new Error(...)`) to prevent accidental misuse.
Loading
Loading