Skip to content

feat(R4): make LLM rate limit configurable via LLM_MAX_CALLS_PER_MIN#88

Open
siva23367 wants to merge 1 commit into
VarunGitGood:mainfrom
siva23367:feature/configurable-rate-limit
Open

feat(R4): make LLM rate limit configurable via LLM_MAX_CALLS_PER_MIN#88
siva23367 wants to merge 1 commit into
VarunGitGood:mainfrom
siva23367:feature/configurable-rate-limit

Conversation

@siva23367

Copy link
Copy Markdown

Summary

Replaces the hardcoded >= 3 LLM-calls-per-minute cap in _wait_for_rate_limit with a configurable LLM_MAX_CALLS_PER_MIN setting.

Changes

  • repi/core/config.py — Added LLM_MAX_CALLS_PER_MIN field (default 60, ge=1)
  • repi/investigation/react_loop.py__init__ reads from settings, _wait_for_rate_limit uses configurable value
  • repi/api/config.py — Improved error handling (400 vs 500 separation)
  • README.md — Added rate limiting documentation with provider recommendations

Testing

All 254 tests pass.

Closes #13

- Add LLM_MAX_CALLS_PER_MIN field to Settings (default 60, ge=1)
- Replace hardcoded >= 3 check in _wait_for_rate_limit with configurable value
- Read setting from settings singleton in ReactInvestigationLoop.__init__
- Document recommended values per provider in README
- Improve PUT /config error handling (400 vs 500 separation)

Closes VarunGitGood#13
@vercel

vercel Bot commented Jun 16, 2026

Copy link
Copy Markdown

@siva23367 is attempting to deploy a commit to the varun7singh's projects Team on Vercel.

A member of the Team first needs to authorize it.

@VarunGitGood VarunGitGood left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The goal is sound but there is one fatal wiring bug that makes the feature non-functional, plus a few unrelated changes that should not be in this PR.

[BLOCKING] container.py is never updated — the setting is never actually read

container.py:227 is the only place ReactInvestigationLoop is instantiated, and this PR does not touch it:

return ReactInvestigationLoop(
    llm=llm,
    tools=tools,
    ...
    enable_reflection=settings.ENABLE_REFLECTION,
    reflection_interval=settings.REFLECTION_INTERVAL,
    # llm_max_calls_per_min missing — always None → hardcoded fallback of 60
)

Result: _wait_for_rate_limit always uses 60 regardless of what is in config.json. Fix: add llm_max_calls_per_min=settings.LLM_MAX_CALLS_PER_MIN, to that constructor call.

Once that wire-up exists, the < 1 guard inside __init__ is also dead code — pydantic already enforces ge=1 before the value can reach the constructor via settings.


On the positive side: the 400/500 split in api/config.py is a genuine improvement and the field definition + README docs are well written.

Comment thread docker-compose.yml
redis:
image: redis:7-alpine
container_name: repi-redis
ports:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design drift — revert this. This directly contradicts the comment three lines below it (which this PR did not remove):

# No host port published — only the app container talks to redis via
# compose-internal DNS. Avoids clashing with a redis already running on
# the host (common on dev boxes).

Publishing 6379 to the host is exactly the problem that comment was guarding against. Unrelated to rate limiting — please drop this hunk.

Comment thread repi/core/config.py
from __future__ import annotations
import os
import json
from typing import Any, List, Optional

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate import — List and Optional are already imported on line 5 (the original from typing import List, Optional). Any is also unused anywhere in this file. Drop this line and keep the original.

Comment thread repi/core/config.py

WATCHER_CONFIG_REFRESH_SECS: int = 30

# R4: configurable LLM calls-per-minute cap for the ReAct investigation loop.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Four-line comment block reads like a changelog entry, not a code comment. Per project style: only comment when the WHY is non-obvious from the code itself. ge=1 already enforces the constraint; the provider recommendations belong in the README (where they already live). Remove this block — the field definition is self-explanatory.

enable_reflection: bool = True,
reflection_interval: int = 3,
max_reflections: int = 2,
llm_max_calls_per_min: Optional[int] = None, # Added for dynamic rate limiting

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the trailing # Added for dynamic rate limiting — it describes what the parameter is (already obvious from its name) not why it exists. Project style: no "what" comments.

self.reflection_interval = reflection_interval
self.max_reflections = max_reflections

# Setup dynamic rate limit with a reasonable fallback (e.g., 60 if not specified)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment — it describes what the code below does, which the code already makes clear.

Also: this block defaults to 60 on None, but that is a hardcoded constant, not settings.LLM_MAX_CALLS_PER_MIN. Until container.py is updated to pass the setting at construction time, this fallback silently overrides the user's config with no warning.

now = time.time()
self._llm_call_timestamps = [t for t in self._llm_call_timestamps if now - t < 60]
while len(self._llm_call_timestamps) >= 3:
# Dynamically checking against the configured limit instead of hardcoded 3

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment — describes what the next line does. Code is self-explanatory.

Comment thread pyproject.toml
"click==8.1.7",
"typer>=0.12.0,<0.13",
"watchfiles>=0.21.0,<0.22",
"httpx>=0.28.1",

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

httpx is already a transitive dependency (used directly in repi/llm/adapters.py), so promoting it to an explicit dep is reasonable — but it has nothing to do with this PR. Please move to a separate commit.

Comment thread web/package-lock.json
{
"name": "web",
"version": "0.1.0",
"version": "1.0.0",

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated version bump (0.1.0 → 1.0.0). Nothing in this PR changes the web package. Please revert — stray lockfile version bumps make it hard to reconstruct the web package changelog later.

Comment thread README.md
| `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama endpoint |
| `LLM_MAX_CALLS_PER_MIN` | `60` | Max LLM calls per rolling 60-second window in the ReAct loop. Lower for free-tier providers; raise for paid/high-tier accounts. |


Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't require this patch, the above added variable is good enough

@VarunGitGood

Copy link
Copy Markdown
Owner

@siva23367 Hey thanks for the PR, can you please check and resolve the comments and feel free to ask for clarifications

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

R4: Make LLM rate limit configurable

2 participants