Integration hot loading#185
Draft
brandomr wants to merge 6 commits into
Draft
Conversation
…ent, not automated]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Experiment Overview
This PR is EXPERIMENTAL: it attempts to address a challenge we have with adhoc-api: the actual code running agent remains pretty "dumb" to the integration after it calls the draft/consult integration tool(s). In long ReAct loops you see behavior where the Beaker agent struggles but doesn't re-call those tools.
Instead, I propose a new experiment: provide the agent tools to load/unload integration docs straight into it's context window via
auto_context.The upside to this is that the agent seems to perform better overall with the integration. The downside is that the ReAct loop ends up with more tokens and costs more. Also, the agent has to remember to properly unload the integration otherwise that info will persist in the system prompt.
Additionally, if this proves viable we can remove adhoc-api altogether as a dependency.
Issues and Challenges
I tried to automatically unload the integration at the end of the ReAct loop, but in certain contexts, such as Biome, the agent always presents a plan before doing work. With Claude Sonnet 4.6 it doesn't call
final_answerorask_userit just ends the ReAct loop. So understanding how and when to automatically prune the context is very tricky. The workaround is theunload_integration_docstool, but I have concerns about the agent reliably calling this tool when appropriate. In my tests with sonnet 4.6 it did it flawlessly though.Worst case, the agent doesn't unload the integration and we reach summarization sooner.
Testing
I tested this with the
integration-loaderbranch of Biome https://github.com/jataware/biome/tree/integration-loader where the draft/consult integration docs tools are disabled and the agent only uses the load/unload tooling.