feat: add configurable max tokens support#65
Conversation
Running the extension agains newer (thinking) models, like Gemma4, produces trimmed responses with stop reason "length". This is because the thinking models generate "thinking" tokens which are counted to the actual prompt response. To workaround this, max_tokens parameter can now be configured via settings dialog.
|
Looking for one thing? Review this PR in Change Stack to search files, summaries, diffs, and code without losing your place. Warning Review limit reached
More reviews will be available in 52 minutes and 3 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis pull request adds a user-configurable maximum token limit setting throughout the application. Previously, token limits were hardcoded in backend request builders (1000 for OpenAI, 1024 for Anthropic). The changes introduce a centralized ChangesConfigurable Token Limits
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@WebAPI/LLMAPICalls.cs`:
- Around line 547-548: The direct call to settings["max_tokens"]?.Value<int?>()
can throw on malformed or oversized persisted values; replace it with guarded
parsing that reads the token value as a string/JSON token, uses int.TryParse (or
long.TryParse then clamp) to safely convert, validates range (reject/limit
values outside acceptable bounds or overflow), and falls back to
BackendSchema.DefaultMaxTokens on failure, then pass the safe maxTokens into
GetSchemaType (same call site using messageContent, modelId, messageType, seed,
maxTokens).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 5f356f47-9590-4c9b-b881-7800429ae846
📒 Files selected for processing (5)
Assets/settings.jsBackendSchema.csTabs/Text2Image/MagicPrompt.htmlWebAPI/LLMAPICalls.csWebAPI/SessionSettings.cs
Running the extension against newer (thinking) models, like Gemma4, produces trimmed responses with the stop reason "length". This is because the thinking models generate "thinking" tokens which are counted to the actual prompt response. To workaround this,
max_tokensparameter can now be configured via settings dialog.Summary by CodeRabbit