Current state
The action-loop system prompt at packages/core/src/prompts.ts:312-426 is a single template applied to every provider and model. The template is dense — roughly 3000-4000 tokens after Liquid render — and includes:
- The full
youArePrompt persona block
- All tool examples (dynamic by capability)
- Core Rules (7 numbered rules + the "EXACTLY ONE tool" block)
- Best Practices (~15 bullets covering autocomplete handling, date pickers, modal dismissal, PDF handling, search vs. browser-search guidance, etc.)
- Conditional guardrails block
- Conditional interactive-mode block (a substantial section with mandatory rules)
When using done() block (5 bullets on formatting)
- The
toolCallInstruction tail (repeats "exactly one tool")
This applies whether the active model is claude-opus-4-7 or gpt-4.1-mini or gemini-2.5-flash or llama3.2.
The gap
Fast/cheap models (Claude Haiku, GPT-4.1-mini, Gemini Flash, local models, etc.) behave differently from frontier models:
- They follow JSON schemas reliably but can drift with long lists of prescriptive "best practices."
- The 15-bullet Best Practices section is closer to harmful than helpful at small model sizes — the model spends attention budget pattern-matching on the wrong best-practice rather than actually planning.
- A 4000-token system prompt is a meaningful fraction of a small model's effective context.
- Many of the rules are about edge cases (date pickers, autocomplete dropdowns, PDF handling) that only apply on certain pages — surfacing them as conditional or page-class-specific would be more efficient.
Conversely, frontier models (Claude Opus, GPT o-series, Gemini Pro) benefit from being told what to think about — the structured reasoning hints (data grounding, pre-done verification — see the dedicated prompt-checklist issue) are exactly the scaffolding they're designed to use well.
Proposed scope
A. Add a "flash" system prompt variant
In prompts.ts, add buildActionLoopSystemPromptFlash(...) that:
- Keeps the
youArePrompt persona (5 lines, anchor for the role)
- Keeps tool examples (dynamic by capability) — these are essential
- Keeps Core Rules (numbered, short)
- Drops the long Best Practices bullets — replace with a single line: "Adapt your approach based on what's actually available; if an action fails, try a different element or use search/find tools to inventory the page."
- Keeps guardrails block conditional (when present)
- Keeps interactive-mode block conditional (when present) — non-negotiable
- Compresses
When using done() to one line: "Format the result as VALID Markdown matching what the user asked for."
- Keeps the
toolCallInstruction tail
Expected output: ~1000-1500 tokens.
B. Add a model-class detector
// In provider.ts:
export type ModelClass = "frontier" | "balanced" | "flash";
export function classifyModel(providerConfig: ProviderConfig): ModelClass {
const modelId = (providerConfig.model?.modelId ?? "").toLowerCase();
const flashPatterns = [
/gpt-4\.1-mini/, /gpt-4o-mini/, /flash/, /haiku/, /local-model/,
/llama3\.2(?!.*70b)/, /gemini-1\.5-flash/,
];
const frontierPatterns = [
/opus/, /gpt-4(\.\d)?$/, /o1\b/, /o3\b/, /gemini-2\.5-pro/, /sonnet/,
];
if (flashPatterns.some(p => p.test(modelId))) return "flash";
if (frontierPatterns.some(p => p.test(modelId))) return "frontier";
return "balanced";
}
The classifier is heuristic — known to mis-identify in edge cases (custom OpenAI-compatible endpoints, novel model names). Provide a WebAgentOptions.promptVariant?: "auto" | "frontier" | "balanced" | "flash" override so callers can pin the variant when the auto-detect is wrong.
C. Wire the variant selection
In initializeSystemPromptAndTask (webAgent.ts:1641-1672):
const variant = this.options.promptVariant === "auto" || !this.options.promptVariant
? classifyModel(this.providerConfig)
: this.options.promptVariant;
const systemPrompt = variant === "flash"
? buildActionLoopSystemPromptFlash(hasGuardrails, hasWebSearch, hasTabstack, hasStartingUrl, hasInteractive)
: buildActionLoopSystemPrompt(hasGuardrails, hasWebSearch, hasTabstack, hasStartingUrl, hasInteractive);
(For now, only flash and the default differ. A future "frontier" variant could add Browser Use's <reasoning_rules>-style structured-thinking scaffolding.)
D. Surface the chosen variant in telemetry
Add promptVariant to the TASK_SETUP event payload so logs and eval-judge consumption can correlate variant choice with outcomes.
Implementation notes
- The model-class detector is necessarily heuristic. Erring on the side of
balanced (the default prompt) is safest — flash variant should only fire when we're confident.
- Test extensively against custom OpenAI-compatible endpoints (
provider: openai-compatible): users running local models through them may set arbitrary modelId values. The override config option handles this case.
- The flash variant should still include the interactive-mode block when applicable — that's a privacy/security concern, not a "best practice."
- Snapshot tests on the existing default prompt should not change (variant defaults to balanced for known frontier models).
- Verify on a small benchmark: does the flash variant on a flash model match or beat the default prompt's task-completion rate? If it underperforms, the variant needs tuning, not just stripping.
Acceptance criteria
buildActionLoopSystemPromptFlash exists in prompts.ts and produces a noticeably shorter rendered output.
classifyModel correctly identifies common flash models (Haiku, gpt-4.1-mini, Gemini Flash) in tests.
WebAgentOptions.promptVariant override works ("flash", "balanced", "frontier", "auto").
TASK_SETUP event includes the chosen variant.
- Benchmark on at least one small eval set: flash-on-flash-model matches or beats default-on-flash-model.
Effort estimate
1-2 days. The prompt-stripping work is fast; benchmarking takes a separate eval run.
Related issues
Pairs naturally with the pre-done verification checklist issue — the checklist should live in the default prompt variant; the flash variant gets a condensed form. Pairs with the prompt-caching issue — flash variant has a shorter cacheable prefix, but caching still helps.
Files likely affected
packages/core/src/prompts.ts (new buildActionLoopSystemPromptFlash)
packages/core/src/provider.ts (new classifyModel, ModelClass type)
packages/core/src/webAgent.ts (initializeSystemPromptAndTask, options surface)
packages/core/src/types/ (WebAgentOptions type)
packages/core/src/events.ts (TASK_SETUP payload)
packages/core/test/webAgent.test.ts
Current state
The action-loop system prompt at
packages/core/src/prompts.ts:312-426is a single template applied to every provider and model. The template is dense — roughly 3000-4000 tokens after Liquid render — and includes:youArePromptpersona blockWhen using done()block (5 bullets on formatting)toolCallInstructiontail (repeats "exactly one tool")This applies whether the active model is
claude-opus-4-7orgpt-4.1-miniorgemini-2.5-flashorllama3.2.The gap
Fast/cheap models (Claude Haiku, GPT-4.1-mini, Gemini Flash, local models, etc.) behave differently from frontier models:
Conversely, frontier models (Claude Opus, GPT o-series, Gemini Pro) benefit from being told what to think about — the structured reasoning hints (data grounding, pre-done verification — see the dedicated prompt-checklist issue) are exactly the scaffolding they're designed to use well.
Proposed scope
A. Add a "flash" system prompt variant
In
prompts.ts, addbuildActionLoopSystemPromptFlash(...)that:youArePromptpersona (5 lines, anchor for the role)When using done()to one line: "Format the result as VALID Markdown matching what the user asked for."toolCallInstructiontailExpected output: ~1000-1500 tokens.
B. Add a model-class detector
The classifier is heuristic — known to mis-identify in edge cases (custom OpenAI-compatible endpoints, novel model names). Provide a
WebAgentOptions.promptVariant?: "auto" | "frontier" | "balanced" | "flash"override so callers can pin the variant when the auto-detect is wrong.C. Wire the variant selection
In
initializeSystemPromptAndTask(webAgent.ts:1641-1672):(For now, only
flashand the default differ. A future "frontier" variant could add Browser Use's<reasoning_rules>-style structured-thinking scaffolding.)D. Surface the chosen variant in telemetry
Add
promptVariantto theTASK_SETUPevent payload so logs and eval-judge consumption can correlate variant choice with outcomes.Implementation notes
balanced(the default prompt) is safest — flash variant should only fire when we're confident.provider: openai-compatible): users running local models through them may set arbitrarymodelIdvalues. The override config option handles this case.Acceptance criteria
buildActionLoopSystemPromptFlashexists inprompts.tsand produces a noticeably shorter rendered output.classifyModelcorrectly identifies common flash models (Haiku, gpt-4.1-mini, Gemini Flash) in tests.WebAgentOptions.promptVariantoverride works ("flash","balanced","frontier","auto").TASK_SETUPevent includes the chosen variant.Effort estimate
1-2 days. The prompt-stripping work is fast; benchmarking takes a separate eval run.
Related issues
Pairs naturally with the pre-done verification checklist issue — the checklist should live in the default prompt variant; the flash variant gets a condensed form. Pairs with the prompt-caching issue — flash variant has a shorter cacheable prefix, but caching still helps.
Files likely affected
packages/core/src/prompts.ts(newbuildActionLoopSystemPromptFlash)packages/core/src/provider.ts(newclassifyModel, ModelClass type)packages/core/src/webAgent.ts(initializeSystemPromptAndTask, options surface)packages/core/src/types/(WebAgentOptions type)packages/core/src/events.ts(TASK_SETUP payload)packages/core/test/webAgent.test.ts