fix(sentinel): close audit drop and login attempt gaps#195
fix(sentinel): close audit drop and login attempt gaps#195Agent-Hellboy wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces memory management for login attempt trackers in the API and UI services by implementing idle entry pruning and a maximum entry cap. It also updates the MCP gateway's analytics emission to track dropped events and refine mutex usage. Feedback focuses on the performance impact of the new maintenance routines; specifically, the O(N) complexity of pruning and eviction logic on every request could lead to CPU exhaustion or DoS vulnerabilities. It is recommended to perform these operations periodically or use more efficient data structures like an LRU cache to ensure scalability.
| func (t *apiLoginAttemptTracker) pruneLocked(now time.Time) { | ||
| if apiLoginAttemptIdleTTL <= 0 { | ||
| return | ||
| } | ||
| for key, state := range t.entries { | ||
| if state.lastSeen.IsZero() || (now.Sub(state.lastSeen) > apiLoginAttemptIdleTTL && !state.lockedUntil.After(now)) { | ||
| delete(t.entries, key) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The pruneLocked method iterates over the entire entries map on every call to allow, recordFailure, and recordSuccess. With a default limit of 4096 entries, this O(N) operation per request introduces significant CPU overhead, especially under high load or during a brute-force attack. This could be exploited as a CPU exhaustion vector.
Consider performing pruning periodically (e.g., using a background goroutine) or only when the map size exceeds a certain threshold to decouple this maintenance from the request path.
| func (t *loginAttemptTracker) pruneLocked(now time.Time) { | ||
| if loginAttemptIdleTTL <= 0 { | ||
| return | ||
| } | ||
| for clientID, state := range t.clients { | ||
| if state.lastSeen.IsZero() || (now.Sub(state.lastSeen) > loginAttemptIdleTTL && !now.Before(state.lockedUntil)) { | ||
| delete(t.clients, clientID) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Similar to the API service, pruneLocked performs an O(N) iteration over the clients map on every request to the login flow. This is inefficient and presents a performance bottleneck. Pruning should ideally be decoupled from the request path or performed less frequently (e.g., probabilistically or via a background task).
| func (t *apiLoginAttemptTracker) enforceMaxLocked() { | ||
| for len(t.entries) > apiLoginAttemptMaxEntries { | ||
| var oldestKey string | ||
| var oldestSeen time.Time | ||
| for key, state := range t.entries { | ||
| if oldestKey == "" || state.lastSeen.Before(oldestSeen) { | ||
| oldestKey = key | ||
| oldestSeen = state.lastSeen | ||
| } | ||
| } | ||
| if oldestKey == "" { | ||
| return | ||
| } | ||
| delete(t.entries, oldestKey) | ||
| } | ||
| } |
There was a problem hiding this comment.
The enforceMaxLocked method performs a full scan of the map to find the oldest entry, resulting in O(N) complexity. While it is only executed when the map exceeds its limit, an attacker could intentionally trigger this scan repeatedly by rotating keys. For better scalability and resistance to DoS, consider using a data structure that supports O(1) eviction, such as a doubly linked list combined with the map (LRU cache pattern).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 566032d34d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for clientID, state := range t.clients { | ||
| if oldestClientID == "" || state.lastSeen.Before(oldestSeen) { | ||
| oldestClientID = clientID | ||
| oldestSeen = state.lastSeen | ||
| } |
There was a problem hiding this comment.
Preserve locked clients when enforcing UI tracker cap
The eviction policy removes whichever client has the oldest lastSeen without checking whether that client is currently locked out. Because enforceMaxLocked() is called from allow() and recordFailure(), once more than 4096 distinct client IDs are seen in a 30-minute window, active lockouts can be evicted and the same attacker can continue login attempts immediately. This weakens the lockout control under high-cardinality or intentionally churned client IDs.
Useful? React with 👍 / 👎.
| for key, state := range t.entries { | ||
| if oldestKey == "" || state.lastSeen.Before(oldestSeen) { | ||
| oldestKey = key | ||
| oldestSeen = state.lastSeen | ||
| } |
There was a problem hiding this comment.
Avoid evicting active API login lockouts under map pressure
The API attempt cap also evicts solely by oldest lastSeen, regardless of whether an entry is still locked. After the map reaches 4096 keys, additional failed logins on other keys can evict a currently locked ip|email entry, allowing immediate retries for that principal and reducing the effectiveness of password-bruteforce throttling.
Useful? React with 👍 / 👎.
Summary
Fixes #146.
Validation
go test ./... -count=1inservices/mcp-gateway,services/api,services/ui, andservices/processorgo test -race ./... -count=1inservices/mcp-gateway,services/api,services/ui, andservices/processorSecurity notes