feat(R4): make LLM rate limit configurable via LLM_MAX_CALLS_PER_MIN#88
feat(R4): make LLM rate limit configurable via LLM_MAX_CALLS_PER_MIN#88siva23367 wants to merge 1 commit into
Conversation
- Add LLM_MAX_CALLS_PER_MIN field to Settings (default 60, ge=1) - Replace hardcoded >= 3 check in _wait_for_rate_limit with configurable value - Read setting from settings singleton in ReactInvestigationLoop.__init__ - Document recommended values per provider in README - Improve PUT /config error handling (400 vs 500 separation) Closes VarunGitGood#13
|
@siva23367 is attempting to deploy a commit to the varun7singh's projects Team on Vercel. A member of the Team first needs to authorize it. |
VarunGitGood
left a comment
There was a problem hiding this comment.
Review
The goal is sound but there is one fatal wiring bug that makes the feature non-functional, plus a few unrelated changes that should not be in this PR.
[BLOCKING] container.py is never updated — the setting is never actually read
container.py:227 is the only place ReactInvestigationLoop is instantiated, and this PR does not touch it:
return ReactInvestigationLoop(
llm=llm,
tools=tools,
...
enable_reflection=settings.ENABLE_REFLECTION,
reflection_interval=settings.REFLECTION_INTERVAL,
# llm_max_calls_per_min missing — always None → hardcoded fallback of 60
)Result: _wait_for_rate_limit always uses 60 regardless of what is in config.json. Fix: add llm_max_calls_per_min=settings.LLM_MAX_CALLS_PER_MIN, to that constructor call.
Once that wire-up exists, the < 1 guard inside __init__ is also dead code — pydantic already enforces ge=1 before the value can reach the constructor via settings.
On the positive side: the 400/500 split in api/config.py is a genuine improvement and the field definition + README docs are well written.
| redis: | ||
| image: redis:7-alpine | ||
| container_name: repi-redis | ||
| ports: |
There was a problem hiding this comment.
Design drift — revert this. This directly contradicts the comment three lines below it (which this PR did not remove):
# No host port published — only the app container talks to redis via
# compose-internal DNS. Avoids clashing with a redis already running on
# the host (common on dev boxes).
Publishing 6379 to the host is exactly the problem that comment was guarding against. Unrelated to rate limiting — please drop this hunk.
| from __future__ import annotations | ||
| import os | ||
| import json | ||
| from typing import Any, List, Optional |
There was a problem hiding this comment.
Duplicate import — List and Optional are already imported on line 5 (the original from typing import List, Optional). Any is also unused anywhere in this file. Drop this line and keep the original.
|
|
||
| WATCHER_CONFIG_REFRESH_SECS: int = 30 | ||
|
|
||
| # R4: configurable LLM calls-per-minute cap for the ReAct investigation loop. |
There was a problem hiding this comment.
Four-line comment block reads like a changelog entry, not a code comment. Per project style: only comment when the WHY is non-obvious from the code itself. ge=1 already enforces the constraint; the provider recommendations belong in the README (where they already live). Remove this block — the field definition is self-explanatory.
| enable_reflection: bool = True, | ||
| reflection_interval: int = 3, | ||
| max_reflections: int = 2, | ||
| llm_max_calls_per_min: Optional[int] = None, # Added for dynamic rate limiting |
There was a problem hiding this comment.
Remove the trailing # Added for dynamic rate limiting — it describes what the parameter is (already obvious from its name) not why it exists. Project style: no "what" comments.
| self.reflection_interval = reflection_interval | ||
| self.max_reflections = max_reflections | ||
|
|
||
| # Setup dynamic rate limit with a reasonable fallback (e.g., 60 if not specified) |
There was a problem hiding this comment.
Remove this comment — it describes what the code below does, which the code already makes clear.
Also: this block defaults to 60 on None, but that is a hardcoded constant, not settings.LLM_MAX_CALLS_PER_MIN. Until container.py is updated to pass the setting at construction time, this fallback silently overrides the user's config with no warning.
| now = time.time() | ||
| self._llm_call_timestamps = [t for t in self._llm_call_timestamps if now - t < 60] | ||
| while len(self._llm_call_timestamps) >= 3: | ||
| # Dynamically checking against the configured limit instead of hardcoded 3 |
There was a problem hiding this comment.
Remove this comment — describes what the next line does. Code is self-explanatory.
| "click==8.1.7", | ||
| "typer>=0.12.0,<0.13", | ||
| "watchfiles>=0.21.0,<0.22", | ||
| "httpx>=0.28.1", |
There was a problem hiding this comment.
httpx is already a transitive dependency (used directly in repi/llm/adapters.py), so promoting it to an explicit dep is reasonable — but it has nothing to do with this PR. Please move to a separate commit.
| { | ||
| "name": "web", | ||
| "version": "0.1.0", | ||
| "version": "1.0.0", |
There was a problem hiding this comment.
Unrelated version bump (0.1.0 → 1.0.0). Nothing in this PR changes the web package. Please revert — stray lockfile version bumps make it hard to reconstruct the web package changelog later.
| | `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama endpoint | | ||
| | `LLM_MAX_CALLS_PER_MIN` | `60` | Max LLM calls per rolling 60-second window in the ReAct loop. Lower for free-tier providers; raise for paid/high-tier accounts. | | ||
|
|
||
|
|
There was a problem hiding this comment.
Don't require this patch, the above added variable is good enough
|
@siva23367 Hey thanks for the PR, can you please check and resolve the comments and feel free to ask for clarifications |
Summary
Replaces the hardcoded
>= 3LLM-calls-per-minute cap in_wait_for_rate_limitwith a configurableLLM_MAX_CALLS_PER_MINsetting.Changes
repi/core/config.py— AddedLLM_MAX_CALLS_PER_MINfield (default 60, ge=1)repi/investigation/react_loop.py—__init__reads from settings,_wait_for_rate_limituses configurable valuerepi/api/config.py— Improved error handling (400 vs 500 separation)README.md— Added rate limiting documentation with provider recommendationsTesting
All 254 tests pass.
Closes #13