feat(api): Add global IP rate limiting framework (ADR-0022 phase 1) by ymh1874 · Pull Request #846 · openstack-experimental/keystone

ymh1874 · 2026-06-25T12:54:44Z

Summary

Implements ADR-0022 phase 1: a handler-level, governor-backed rate-limiting framework wired on POST /v3/auth/tokens.
Adds a reusable RateLimitSection config struct and RateLimitState service field so future buckets (per-user, per-domain, per-IdP) are a one-field addition to the framework.
Fires the global per-IP check before password hashing (ADR-0022 Invariant 4); returns 429 Too Many Requests with Retry-After.

#842 (ConnectInfo capture) is now merged into main, so this PR targets main directly and is no longer stacked. Rebased onto latest main; conflicts with the concurrently-merged API Key limiter (ADR-0021) and audit refactor are resolved (see "Integration notes" below).

ADR-0022 invariants addressed

#	Invariant	How
1	No hardcoded limits	All limits from `[rate_limit_global_ip]` in `keystone.conf`
2	Fail-hard init	`RateLimitState::from_config` returns `KeystoneError::RateLimitConfig` and aborts startup when `enabled=true` with zero burst or replenish rate
3	Uniform response	Single `Retry-After` header; no key-identifying information exposed
4	Check before hash	Rate limit fires before `authenticate_request` (password hash)
5	Distinct buckets	One `Arc<DefaultKeyedRateLimiter<String>>` per bucket; deferred buckets are `None` fields
6	Monotonic clock	`governor::DefaultClock` = `QuantaClock` on std targets (TSC-backed, always monotonic)

Invariants 7 (username normalization) and 8 (post-lookup per-user throttle) are deferred — they require keying on a confirmed user ID after DB lookup but before hashing, tracked as a follow-up driver refactor.

SPIFFE bypass

Internal (mTLS/TCP) and admin (mTLS/UDS) interfaces do not populate ConnectInfo<SocketAddr>; the handler receives None and skips IP limiting. Only the public TCP listener is subject to this check.

Note: Option<Extension<ConnectInfo<SocketAddr>>> is used (not Option<ConnectInfo<SocketAddr>>) because in axum 0.8, Option<T>: FromRequestParts<S> requires T: OptionalFromRequestParts<S>, which ConnectInfo<T> does not implement but Extension<T> does.

Integration notes (post-rebase)

main gained an API Key ingress limiter (api_key_rate_limiter, ADR-0021) and an audit refactor that splits create into an outer audit wrapper + inner create_inner. This PR integrates with both:

Both limiters coexist as independent fields on Service (api_key_rate_limiter + rate_limiters).
The global-IP check lives in create_inner, so a rate-limited request still flows through the outer handler's perimeter-authenticate audit emission (recorded as a TooManyRequests failure).
KeystoneApiError::TooManyRequests consolidated: main's API Key limiter returned a bare unit TooManyRequests (429, no Retry-After). I merged it into the ADR-0022 struct variant TooManyRequests { retry_after }, and updated the API Key path to compute a real retry_after from its limiter. Both paths now emit a uniform 429 body + Retry-After header (ADR-0022 Invariant 3).

❓ Question for maintainer

I unified the two TooManyRequests variants (ADR-0021 API Key limiter + ADR-0022 IP limiter) into one { retry_after } variant so every 429 carries a Retry-After. This slightly changes the API Key limiter's response (it now includes Retry-After, which it didn't before). Is that the desired behavior, or would you prefer the two limiters keep separate error variants / response shapes? Easy to split back out if you want them decoupled.

Files changed

File	Change
`crates/config/src/rate_limit.rs` (new)	`RateLimitSection` — reusable config struct for any bucket
`crates/config/src/lib.rs`	Add `rate_limit_global_ip: RateLimitSection` to `Config`
`crates/core-types/src/error.rs`	Add `KeystoneError::RateLimitConfig`
`crates/core/src/rate_limit.rs` (new)	`RateLimitState`, `check_ip`, `retain_recent`, IPv6 /64 key derivation, unit tests
`crates/core/src/keystone.rs`	Add `rate_limiters: RateLimitState` to `Service`
`crates/core/src/api/api_key_auth.rs`	API Key limiter uses the unified `TooManyRequests { retry_after }`
`crates/api-types/src/error.rs`	Consolidate `TooManyRequests { retry_after }` variant
`crates/api-types/src/error_conv.rs`	Map `TooManyRequests` → 429 + `Retry-After` header
`crates/keystone/src/api/v3/auth/token/create.rs`	Add IP rate-limit check to `create_inner`; handler + e2e tests
`crates/keystone/src/audit.rs`	Match updated `TooManyRequests { .. }` variant
`crates/keystone/src/bin/keystone.rs`	Add 60 s background eviction task
`tools/keystone.conf`	Document `[rate_limit_global_ip]` with default values

Test plan

cargo fmt --check clean
cargo clippy --lib --tests --workspace clean
Unit tests pass across core / config / api-types / keystone (759 total, incl. new rate-limit + API Key limiter tests)
cargo build --locked succeeds — no Cargo.lock changes needed (governor was already a dependency via ADR-0021)
End-to-end 401 → 429 flip verified by an automated test (test_rate_limit_429_over_connect_info_make_service): drives the real create route through the exact into_make_service_with_connect_info::<SocketAddr> path the public listener uses, with a fixed peer address so ConnectInfo is populated by axum itself (not injected). First request from an IP is non-429; the second (burst spent) returns 429 with a Retry-After header — the same behavior the manual curl loop would show, without needing a live DB/OPA.

Partially implements #843.

🤖 Generated with Claude Code

gtema

looks good, few nits inline
One more thing is that you implement here the IP address based throttling, but the invariant 9 is not addressed, which claims that in case of reverse proxies it should be the original address and not the proxy address. On one side this correlates with your other PR, on the other side this requires [rate_limit_trusted_proxies] section. It can be implemented after, but maybe good to cover now with introduction of the IP based limiter (don't know how phase1 was defined)

gtema · 2026-07-03T16:26:22Z

+
+    /// Maximum number of cells that can be consumed in a burst before
+    /// replenishment kicks in. Must be ≥ 1 when `enabled = true`.
+    #[serde(default = "default_burst_size")]


we should add validation for both values to be [1, 100000] - this is defined in the ADR

Done — added the [1, 100000] bound for both burst_size and replenish_rate_per_second (ADR-0022 config-bounds table). Enforced in build_limiter/from_config as a fail-hard startup error alongside the existing zero check, via a small validated_scalar helper. I kept it there rather than a field-level validator range(min=1, max=100000) because a disabled section must be allowed to carry out-of-range/zero values without aborting startup (existing invariant + tests) — a field-level range would fire unconditionally. Added tests for the upper bound and the 100000 boundary; field docs updated.

ymh1874 · 2026-07-03T22:20:16Z

On the two points:

Config bounds (inline): done — burst_size and replenish_rate_per_second are now enforced to [1, 100000] as a fail-hard startup error.

Invariant 9 (trusted-proxy source IP): you're right that with the direct-peer address the limiter buckets by the reverse proxy when one is present. I'd like to land it as a focused follow-up rather than in this phase-1 PR, for a concrete reason: ADR-0022 Invariant 7 requires the client-IP resolution to be "a single shared utility used by both the rate limiter and the authentication pipeline". That shared utility is exactly what I just introduced in the sibling PR #908 — core::api::forwarded::resolve_client_ip (trusted-proxy allowlist + rightmost-non-trusted-hop walk + hop cap, already used by the API-key ingress).

Implementing Invariant 9 here now would mean duplicating that resolver on this branch (the two PRs are on independent branches), which is the opposite of the "single shared utility" requirement. Once #908 lands, the follow-up is small: add a trusted_proxies source (either a [rate_limit_trusted_proxies] section as the ADR sketches, or reuse [oslo_middleware] trusted_proxies), resolve the client IP in the token-create extractor before check_ip, and warn when the global-IP limiter is enabled with an empty allowlist (ADR-0022 §Consequences). Happy to reorder if you'd prefer it in this PR instead — just flagging the duplication trade-off.

Implements ADR-0022 phase 1: a handler-level rate-limiting framework backed by the `governor` crate. Wires a global per-IP bucket on the `POST /v3/auth/tokens` handler, checking the limit before the CPU-intensive password-hash path (Invariant 4). Framework design: - `crates/config/src/rate_limit.rs`: reusable `RateLimitSection` struct (enabled/burst_size/replenish_rate_per_second) shared by all future buckets (per-user, per-domain, per-IdP). - `crates/core/src/rate_limit.rs`: `RateLimitState` with one `Option<Arc<DefaultKeyedRateLimiter<String>>>` per bucket. Disabled buckets cost only an `Option` discriminant. Includes `check_ip`, `retain_recent`, and IPv6 /64 prefix aggregation. - Fail-hard init (Invariant 2): `RateLimitState::from_config` returns `KeystoneError::RateLimitConfig` when `enabled=true` with zero burst or replenish rate, aborting startup rather than silently mis-configuring. - `KeystoneApiError::TooManyRequests { retry_after }` -> HTTP 429 with a `Retry-After` header. This unifies with the API Key limiter (ADR-0021), which previously returned a bare 429 with no Retry-After; both paths now emit a uniform 429 body + Retry-After (ADR-0022 Invariant 3). - SPIFFE bypass: `Option<Extension<ConnectInfo<SocketAddr>>>` as a handler argument gives `None` on internal/admin mTLS interfaces (which don't populate `ConnectInfo`), so rate limiting applies only to the public TCP listener. - Background eviction task (60 s interval) calls `retain_recent()` on all keyed stores, preventing unbounded memory growth under adversarial unique-key flooding. Coexists with the API Key ingress limiter (`api_key_rate_limiter`, ADR-0021) on `Service`; the two are independent buckets. Tests: - Handler: burst_size=1 with `ConnectInfo` injected -> first request passes the limit (auth error, non-429), second is 429 with Retry-After; and a request with no `ConnectInfo` (SPIFFE bypass) is never limited. - End-to-end: drives the real `create` route through `into_make_service_with_connect_info::<SocketAddr>` with a fixed peer, so `ConnectInfo` is populated by axum itself (not injected), proving the full TCP-peer -> extractor -> check_ip -> 429 chain. Deferred: per-user, per-domain, and per-IdP buckets require keying on a confirmed user/domain ID after DB lookup but before password hashing -- an invasive driver refactor tracked as a follow-up to this framework PR. Partially implements openstack-experimental#843. Note: This commit was done with the help of AI. Signed-off-by: Yousef Hussein <ymh1874@gmail.com>

Per the ADR-0022 config-bounds table, `burst_size` and `replenish_rate_per_second` must each fall within [1, 100000] when a bucket is enabled; values outside that range must fail startup. Extend the enabled-bucket validation in `build_limiter` (via a small `validated_scalar` helper) to reject values above 100000, alongside the existing zero check. The bound is enforced there rather than as a field-level `validator` range so that a disabled section with out-of-range values stays harmless and does not abort startup. Config field docs and tests updated to cover the upper bound and the boundary. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Yousef Hussein <ymh1874@gmail.com>

ymh1874 force-pushed the feature/843-rate-limiting branch 2 times, most recently from 346b29c to c98c7cb Compare June 26, 2026 18:32

ymh1874 force-pushed the feature/843-rate-limiting branch from c98c7cb to 1291ded Compare July 3, 2026 10:57

ymh1874 marked this pull request as ready for review July 3, 2026 11:01

ymh1874 force-pushed the feature/843-rate-limiting branch from 1291ded to cca3cd4 Compare July 3, 2026 11:55

ymh1874 requested a review from gtema July 3, 2026 11:55

gtema requested changes Jul 3, 2026

View reviewed changes

ymh1874 force-pushed the feature/843-rate-limiting branch 2 times, most recently from e3be8a9 to 0e9c8f6 Compare July 3, 2026 22:19

ymh1874 requested a review from gtema July 3, 2026 22:20

ymh1874 and others added 2 commits July 4, 2026 19:49

ymh1874 force-pushed the feature/843-rate-limiting branch from 0e9c8f6 to a88f46c Compare July 4, 2026 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(api): Add global IP rate limiting framework (ADR-0022 phase 1)#846

feat(api): Add global IP rate limiting framework (ADR-0022 phase 1)#846
ymh1874 wants to merge 2 commits into
openstack-experimental:mainfrom
ymh1874:feature/843-rate-limiting

ymh1874 commented Jun 25, 2026 •

edited

Loading

Uh oh!

gtema left a comment

Uh oh!

gtema Jul 3, 2026

Uh oh!

ymh1874 Jul 3, 2026

Uh oh!

ymh1874 commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ymh1874 commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

ADR-0022 invariants addressed

SPIFFE bypass

Integration notes (post-rebase)

❓ Question for maintainer

Files changed

Test plan

Uh oh!

gtema left a comment

Choose a reason for hiding this comment

Uh oh!

gtema Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

ymh1874 Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

ymh1874 commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ymh1874 commented Jun 25, 2026 •

edited

Loading