Skip to content

fix(sentinel): close audit drop and login attempt gaps#195

Open
Agent-Hellboy wants to merge 1 commit into
mainfrom
sentinel/issue_146_operational_fixes
Open

fix(sentinel): close audit drop and login attempt gaps#195
Agent-Hellboy wants to merge 1 commit into
mainfrom
sentinel/issue_146_operational_fixes

Conversation

@Agent-Hellboy
Copy link
Copy Markdown
Owner

Summary

Fixes #146.

  • Count and log mcp-gateway analytics events dropped because the bounded queue is full.
  • Bound API login-attempt state with idle eviction and a maximum retained entry count.
  • Bound UI login-attempt state with idle eviction and a maximum retained client count, including the pre-decode allow path.

Validation

  • go test ./... -count=1 in services/mcp-gateway, services/api, services/ui, and services/processor
  • go test -race ./... -count=1 in services/mcp-gateway, services/api, services/ui, and services/processor
  • pre-commit hooks during commit: gitleaks, go fmt, staticcheck, go vet, go unit tests, generated file drift

Security notes

  • No secrets or credentials are logged; the analytics drop log only includes drop count, source, and event type.
  • Login attempt trackers remain per-process throttles but no longer retain attacker-controlled keys indefinitely.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces memory management for login attempt trackers in the API and UI services by implementing idle entry pruning and a maximum entry cap. It also updates the MCP gateway's analytics emission to track dropped events and refine mutex usage. Feedback focuses on the performance impact of the new maintenance routines; specifically, the O(N) complexity of pruning and eviction logic on every request could lead to CPU exhaustion or DoS vulnerabilities. It is recommended to perform these operations periodically or use more efficient data structures like an LRU cache to ensure scalability.

Comment on lines +108 to +117
func (t *apiLoginAttemptTracker) pruneLocked(now time.Time) {
if apiLoginAttemptIdleTTL <= 0 {
return
}
for key, state := range t.entries {
if state.lastSeen.IsZero() || (now.Sub(state.lastSeen) > apiLoginAttemptIdleTTL && !state.lockedUntil.After(now)) {
delete(t.entries, key)
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The pruneLocked method iterates over the entire entries map on every call to allow, recordFailure, and recordSuccess. With a default limit of 4096 entries, this O(N) operation per request introduces significant CPU overhead, especially under high load or during a brute-force attack. This could be exploited as a CPU exhaustion vector.

Consider performing pruning periodically (e.g., using a background goroutine) or only when the map size exceeds a certain threshold to decouple this maintenance from the request path.

Comment thread services/ui/main.go
Comment on lines +790 to +799
func (t *loginAttemptTracker) pruneLocked(now time.Time) {
if loginAttemptIdleTTL <= 0 {
return
}
for clientID, state := range t.clients {
if state.lastSeen.IsZero() || (now.Sub(state.lastSeen) > loginAttemptIdleTTL && !now.Before(state.lockedUntil)) {
delete(t.clients, clientID)
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the API service, pruneLocked performs an O(N) iteration over the clients map on every request to the login flow. This is inefficient and presents a performance bottleneck. Pruning should ideally be decoupled from the request path or performed less frequently (e.g., probabilistically or via a background task).

Comment on lines +119 to +134
func (t *apiLoginAttemptTracker) enforceMaxLocked() {
for len(t.entries) > apiLoginAttemptMaxEntries {
var oldestKey string
var oldestSeen time.Time
for key, state := range t.entries {
if oldestKey == "" || state.lastSeen.Before(oldestSeen) {
oldestKey = key
oldestSeen = state.lastSeen
}
}
if oldestKey == "" {
return
}
delete(t.entries, oldestKey)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The enforceMaxLocked method performs a full scan of the map to find the oldest entry, resulting in O(N) complexity. While it is only executed when the map exceeds its limit, an attacker could intentionally trigger this scan repeatedly by rotating keys. For better scalability and resistance to DoS, consider using a data structure that supports O(1) eviction, such as a doubly linked list combined with the map (LRU cache pattern).

@Agent-Hellboy Agent-Hellboy marked this pull request as ready for review May 18, 2026 07:22
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 566032d34d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread services/ui/main.go
Comment on lines +805 to +809
for clientID, state := range t.clients {
if oldestClientID == "" || state.lastSeen.Before(oldestSeen) {
oldestClientID = clientID
oldestSeen = state.lastSeen
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve locked clients when enforcing UI tracker cap

The eviction policy removes whichever client has the oldest lastSeen without checking whether that client is currently locked out. Because enforceMaxLocked() is called from allow() and recordFailure(), once more than 4096 distinct client IDs are seen in a 30-minute window, active lockouts can be evicted and the same attacker can continue login attempts immediately. This weakens the lockout control under high-cardinality or intentionally churned client IDs.

Useful? React with 👍 / 👎.

Comment on lines +123 to +127
for key, state := range t.entries {
if oldestKey == "" || state.lastSeen.Before(oldestSeen) {
oldestKey = key
oldestSeen = state.lastSeen
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid evicting active API login lockouts under map pressure

The API attempt cap also evicts solely by oldest lastSeen, regardless of whether an entry is still locked. After the map reaches 4096 keys, additional failed logins on other keys can evict a currently locked ip|email entry, allowing immediate retries for that principal and reducing the effectiveness of password-bruteforce throttling.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Review follow-up: shutdown data loss, silent audit drops, and unbounded login-attempt state

1 participant