feat(api): Add global IP rate limiting framework (ADR-0022 phase 1)#846
feat(api): Add global IP rate limiting framework (ADR-0022 phase 1)#846ymh1874 wants to merge 2 commits into
Conversation
346b29c to
c98c7cb
Compare
c98c7cb to
1291ded
Compare
1291ded to
cca3cd4
Compare
gtema
left a comment
There was a problem hiding this comment.
looks good, few nits inline
One more thing is that you implement here the IP address based throttling, but the invariant 9 is not addressed, which claims that in case of reverse proxies it should be the original address and not the proxy address. On one side this correlates with your other PR, on the other side this requires [rate_limit_trusted_proxies] section. It can be implemented after, but maybe good to cover now with introduction of the IP based limiter (don't know how phase1 was defined)
|
|
||
| /// Maximum number of cells that can be consumed in a burst before | ||
| /// replenishment kicks in. Must be ≥ 1 when `enabled = true`. | ||
| #[serde(default = "default_burst_size")] |
There was a problem hiding this comment.
we should add validation for both values to be [1, 100000] - this is defined in the ADR
There was a problem hiding this comment.
Done — added the [1, 100000] bound for both burst_size and replenish_rate_per_second (ADR-0022 config-bounds table). Enforced in build_limiter/from_config as a fail-hard startup error alongside the existing zero check, via a small validated_scalar helper. I kept it there rather than a field-level validator range(min=1, max=100000) because a disabled section must be allowed to carry out-of-range/zero values without aborting startup (existing invariant + tests) — a field-level range would fire unconditionally. Added tests for the upper bound and the 100000 boundary; field docs updated.
e3be8a9 to
0e9c8f6
Compare
|
On the two points: Config bounds (inline): done — Invariant 9 (trusted-proxy source IP): you're right that with the direct-peer address the limiter buckets by the reverse proxy when one is present. I'd like to land it as a focused follow-up rather than in this phase-1 PR, for a concrete reason: ADR-0022 Invariant 7 requires the client-IP resolution to be "a single shared utility used by both the rate limiter and the authentication pipeline". That shared utility is exactly what I just introduced in the sibling PR #908 — Implementing Invariant 9 here now would mean duplicating that resolver on this branch (the two PRs are on independent branches), which is the opposite of the "single shared utility" requirement. Once #908 lands, the follow-up is small: add a |
Implements ADR-0022 phase 1: a handler-level rate-limiting framework
backed by the `governor` crate. Wires a global per-IP bucket on the
`POST /v3/auth/tokens` handler, checking the limit before the
CPU-intensive password-hash path (Invariant 4).
Framework design:
- `crates/config/src/rate_limit.rs`: reusable `RateLimitSection` struct
(enabled/burst_size/replenish_rate_per_second) shared by all future
buckets (per-user, per-domain, per-IdP).
- `crates/core/src/rate_limit.rs`: `RateLimitState` with one
`Option<Arc<DefaultKeyedRateLimiter<String>>>` per bucket. Disabled
buckets cost only an `Option` discriminant. Includes `check_ip`,
`retain_recent`, and IPv6 /64 prefix aggregation.
- Fail-hard init (Invariant 2): `RateLimitState::from_config` returns
`KeystoneError::RateLimitConfig` when `enabled=true` with zero burst
or replenish rate, aborting startup rather than silently mis-configuring.
- `KeystoneApiError::TooManyRequests { retry_after }` -> HTTP 429 with a
`Retry-After` header. This unifies with the API Key limiter (ADR-0021),
which previously returned a bare 429 with no Retry-After; both paths now
emit a uniform 429 body + Retry-After (ADR-0022 Invariant 3).
- SPIFFE bypass: `Option<Extension<ConnectInfo<SocketAddr>>>` as a handler
argument gives `None` on internal/admin mTLS interfaces (which don't
populate `ConnectInfo`), so rate limiting applies only to the public
TCP listener.
- Background eviction task (60 s interval) calls `retain_recent()` on
all keyed stores, preventing unbounded memory growth under adversarial
unique-key flooding.
Coexists with the API Key ingress limiter (`api_key_rate_limiter`,
ADR-0021) on `Service`; the two are independent buckets.
Tests:
- Handler: burst_size=1 with `ConnectInfo` injected -> first request passes
the limit (auth error, non-429), second is 429 with Retry-After; and a
request with no `ConnectInfo` (SPIFFE bypass) is never limited.
- End-to-end: drives the real `create` route through
`into_make_service_with_connect_info::<SocketAddr>` with a fixed peer,
so `ConnectInfo` is populated by axum itself (not injected), proving the
full TCP-peer -> extractor -> check_ip -> 429 chain.
Deferred: per-user, per-domain, and per-IdP buckets require keying on a
confirmed user/domain ID after DB lookup but before password hashing -- an
invasive driver refactor tracked as a follow-up to this framework PR.
Partially implements openstack-experimental#843.
Note: This commit was done with the help of AI.
Signed-off-by: Yousef Hussein <ymh1874@gmail.com>
Per the ADR-0022 config-bounds table, `burst_size` and `replenish_rate_per_second` must each fall within [1, 100000] when a bucket is enabled; values outside that range must fail startup. Extend the enabled-bucket validation in `build_limiter` (via a small `validated_scalar` helper) to reject values above 100000, alongside the existing zero check. The bound is enforced there rather than as a field-level `validator` range so that a disabled section with out-of-range values stays harmless and does not abort startup. Config field docs and tests updated to cover the upper bound and the boundary. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Yousef Hussein <ymh1874@gmail.com>
0e9c8f6 to
a88f46c
Compare
Summary
governor-backed rate-limiting framework wired onPOST /v3/auth/tokens.RateLimitSectionconfig struct andRateLimitStateservice field so future buckets (per-user, per-domain, per-IdP) are a one-field addition to the framework.429 Too Many RequestswithRetry-After.ADR-0022 invariants addressed
[rate_limit_global_ip]inkeystone.confRateLimitState::from_configreturnsKeystoneError::RateLimitConfigand aborts startup whenenabled=truewith zero burst or replenish rateRetry-Afterheader; no key-identifying information exposedauthenticate_request(password hash)Arc<DefaultKeyedRateLimiter<String>>per bucket; deferred buckets areNonefieldsgovernor::DefaultClock=QuantaClockon std targets (TSC-backed, always monotonic)Invariants 7 (username normalization) and 8 (post-lookup per-user throttle) are deferred — they require keying on a confirmed user ID after DB lookup but before hashing, tracked as a follow-up driver refactor.
SPIFFE bypass
Internal (mTLS/TCP) and admin (mTLS/UDS) interfaces do not populate
ConnectInfo<SocketAddr>; the handler receivesNoneand skips IP limiting. Only the public TCP listener is subject to this check.Note:
Option<Extension<ConnectInfo<SocketAddr>>>is used (notOption<ConnectInfo<SocketAddr>>) because in axum 0.8,Option<T>: FromRequestParts<S>requiresT: OptionalFromRequestParts<S>, whichConnectInfo<T>does not implement butExtension<T>does.Integration notes (post-rebase)
maingained an API Key ingress limiter (api_key_rate_limiter, ADR-0021) and an audit refactor that splitscreateinto an outer audit wrapper + innercreate_inner. This PR integrates with both:Service(api_key_rate_limiter+rate_limiters).create_inner, so a rate-limited request still flows through the outer handler's perimeter-authenticate audit emission (recorded as aTooManyRequestsfailure).KeystoneApiError::TooManyRequestsconsolidated:main's API Key limiter returned a bare unitTooManyRequests(429, noRetry-After). I merged it into the ADR-0022 struct variantTooManyRequests { retry_after }, and updated the API Key path to compute a realretry_afterfrom its limiter. Both paths now emit a uniform 429 body +Retry-Afterheader (ADR-0022 Invariant 3).❓ Question for maintainer
I unified the two
TooManyRequestsvariants (ADR-0021 API Key limiter + ADR-0022 IP limiter) into one{ retry_after }variant so every 429 carries aRetry-After. This slightly changes the API Key limiter's response (it now includesRetry-After, which it didn't before). Is that the desired behavior, or would you prefer the two limiters keep separate error variants / response shapes? Easy to split back out if you want them decoupled.Files changed
crates/config/src/rate_limit.rs(new)RateLimitSection— reusable config struct for any bucketcrates/config/src/lib.rsrate_limit_global_ip: RateLimitSectiontoConfigcrates/core-types/src/error.rsKeystoneError::RateLimitConfigcrates/core/src/rate_limit.rs(new)RateLimitState,check_ip,retain_recent, IPv6 /64 key derivation, unit testscrates/core/src/keystone.rsrate_limiters: RateLimitStatetoServicecrates/core/src/api/api_key_auth.rsTooManyRequests { retry_after }crates/api-types/src/error.rsTooManyRequests { retry_after }variantcrates/api-types/src/error_conv.rsTooManyRequests→ 429 +Retry-Afterheadercrates/keystone/src/api/v3/auth/token/create.rscreate_inner; handler + e2e testscrates/keystone/src/audit.rsTooManyRequests { .. }variantcrates/keystone/src/bin/keystone.rstools/keystone.conf[rate_limit_global_ip]with default valuesTest plan
cargo fmt --checkcleancargo clippy --lib --tests --workspacecleancore/config/api-types/keystone(759 total, incl. new rate-limit + API Key limiter tests)cargo build --lockedsucceeds — noCargo.lockchanges needed (governorwas already a dependency via ADR-0021)401 → 429flip verified by an automated test (test_rate_limit_429_over_connect_info_make_service): drives the realcreateroute through the exactinto_make_service_with_connect_info::<SocketAddr>path the public listener uses, with a fixed peer address soConnectInfois populated by axum itself (not injected). First request from an IP is non-429; the second (burst spent) returns429with aRetry-Afterheader — the same behavior the manualcurlloop would show, without needing a live DB/OPA.Partially implements #843.
🤖 Generated with Claude Code