Thinking-level support across all inference kinds (LLM tab)#21
Merged
Conversation
Add a 'thinking level' (low/medium/high) selector to the LLM tab, shown only when the selected model supports reasoning. The value maps to output_config.effort + adaptive thinking for Anthropic, and reasoning_effort for OpenAI-compatible providers. Off by default; never sent for models that don't support thinking. Closes #9
Add independent thinking-level config for memory extraction/consolidation, goal decomposition, task reflection, and compaction. generate_text() now honors the client's level via a shared _reasoning_kwargs() helper, and _background_llm() clones the main client (sharing the SDK connection) to override only the level — so background tasks get their own effort setting.
Add POST /setup/thinking-levels (Anthropic capability lookup via Models API) and pass per-kind thinking levels to the LLM tab template.
Add levelOptions/fetchThinkingLevels (Anthropic autodiscovery), providerDocsUrl (info links for other providers), and sameAsMain (shared-config badge).
Reusable Jinja macro renders a thinking-level field next to every inference kind's model (main, extraction, consolidation, goal decomposition, task reflection, compaction). Each is a type-or-pick datalist; Anthropic gets a 'Fetch levels' button (capability autodiscovery), other providers get an 'effort docs' link to enter the value manually. Background kinds show a 'same config as Main inference' badge when they target the main provider+model. Each level is saved to its own config key, cleared when the model can't think.
The modelSupportsThinking() substring heuristic hid every control for real model ids (deepseek-v4-flash, claude-haiku-4-5, …). Drop the visibility gate so the field shows for all providers/models — matching the 'enter effort manually for other providers' requirement — and save the typed value as-is. The heuristic is demoted to a non-blocking amber hint shown only when a level is set on an unrecognized model.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #9.
Adds a thinking / reasoning-effort control to the LLM tab — for the main inference model and every background inference kind.
Inference kinds covered
Main inference, memory extraction, memory consolidation, goal decomposition, task reflection, and history compaction — each with its own independent thinking-level setting.
UI
POST /setup/thinking-levels, which reads the model's capabilities via the Anthropic Models API (capabilities.effort.*) and populates the real supported values (low/medium/high/xhigh/max).modelSupportsThinking()is only this hint now — it does not hide the control.)Backend
AgentConfig.thinking_level+ per-kind fields onMemoryConfig/GoalDecompositionConfig/TaskReflectionConfig/CompactionConfig.LLMClient._reasoning_kwargs()maps the level per provider: Anthropic →thinking={"type":"adaptive"}+output_config={"effort": level}; OpenAI-compatible →reasoning_effort=level. Applied to bothgenerate()(main loop) andgenerate_text()(background tasks). Nothing is sent when off (""), so untouched configs are byte-identical to before._background_llm()clones the main client (sharing the SDK connection) to override only the level, so each background task gets its own effort independent of main inference.POST /setup/thinking-levelsfor Anthropic capability autodiscovery.Tests
tests/test_llm.pycovers per-provider kwargs and effort emission on bothgenerate()andgenerate_text()(off by default). Stubs intest_compaction.py/test_scheduler.pyupdated for the new signature.Note on issue scope
#9 asked to show the control "only for compatible models". A substring heuristic for that hid every control for real non-Anthropic ids (e.g.
deepseek-v4-flash) and conflicted with the follow-up requirement to enter effort manually for other providers — so the final design shows the field always and guides correctness via autodiscovery (Anthropic), docs links (others), and the amber hint, rather than hiding it.