Add flash-mode system prompt variant for fast/cheap models

## Current state

The action-loop system prompt at `packages/core/src/prompts.ts:312-426` is a single template applied to every provider and model. The template is dense — roughly 3000-4000 tokens after Liquid render — and includes:

- The full `youArePrompt` persona block
- All tool examples (dynamic by capability)
- Core Rules (7 numbered rules + the "EXACTLY ONE tool" block)
- Best Practices (~15 bullets covering autocomplete handling, date pickers, modal dismissal, PDF handling, search vs. browser-search guidance, etc.)
- Conditional guardrails block
- Conditional interactive-mode block (a substantial section with mandatory rules)
- `When using done()` block (5 bullets on formatting)
- The `toolCallInstruction` tail (repeats "exactly one tool")

This applies whether the active model is `claude-opus-4-7` or `gpt-4.1-mini` or `gemini-2.5-flash` or `llama3.2`.

## The gap

Fast/cheap models (Claude Haiku, GPT-4.1-mini, Gemini Flash, local models, etc.) behave differently from frontier models:

- They follow JSON schemas reliably but can drift with long lists of prescriptive "best practices."
- The 15-bullet Best Practices section is closer to harmful than helpful at small model sizes — the model spends attention budget pattern-matching on the wrong best-practice rather than actually planning.
- A 4000-token system prompt is a meaningful fraction of a small model's effective context.
- Many of the rules are about edge cases (date pickers, autocomplete dropdowns, PDF handling) that only apply on certain pages — surfacing them as conditional or page-class-specific would be more efficient.

Conversely, frontier models (Claude Opus, GPT o-series, Gemini Pro) benefit from being told what to think about — the structured reasoning hints (data grounding, pre-done verification — see the dedicated prompt-checklist issue) are exactly the scaffolding they're designed to use well.

## Proposed scope

### A. Add a "flash" system prompt variant

In `prompts.ts`, add `buildActionLoopSystemPromptFlash(...)` that:

- Keeps the `youArePrompt` persona (5 lines, anchor for the role)
- Keeps tool examples (dynamic by capability) — these are essential
- Keeps Core Rules (numbered, short)
- **Drops** the long Best Practices bullets — replace with a single line: "Adapt your approach based on what's actually available; if an action fails, try a different element or use search/find tools to inventory the page."
- Keeps guardrails block conditional (when present)
- Keeps interactive-mode block conditional (when present) — non-negotiable
- Compresses `When using done()` to one line: "Format the result as VALID Markdown matching what the user asked for."
- Keeps the `toolCallInstruction` tail

Expected output: ~1000-1500 tokens.

### B. Add a model-class detector

```ts
// In provider.ts:
export type ModelClass = "frontier" | "balanced" | "flash";

export function classifyModel(providerConfig: ProviderConfig): ModelClass {
  const modelId = (providerConfig.model?.modelId ?? "").toLowerCase();
  const flashPatterns = [
    /gpt-4\.1-mini/, /gpt-4o-mini/, /flash/, /haiku/, /local-model/,
    /llama3\.2(?!.*70b)/, /gemini-1\.5-flash/,
  ];
  const frontierPatterns = [
    /opus/, /gpt-4(\.\d)?$/, /o1\b/, /o3\b/, /gemini-2\.5-pro/, /sonnet/,
  ];
  if (flashPatterns.some(p => p.test(modelId))) return "flash";
  if (frontierPatterns.some(p => p.test(modelId))) return "frontier";
  return "balanced";
}
```

The classifier is heuristic — known to mis-identify in edge cases (custom OpenAI-compatible endpoints, novel model names). Provide a `WebAgentOptions.promptVariant?: "auto" | "frontier" | "balanced" | "flash"` override so callers can pin the variant when the auto-detect is wrong.

### C. Wire the variant selection

In `initializeSystemPromptAndTask` (`webAgent.ts:1641-1672`):

```ts
const variant = this.options.promptVariant === "auto" || !this.options.promptVariant
  ? classifyModel(this.providerConfig)
  : this.options.promptVariant;

const systemPrompt = variant === "flash"
  ? buildActionLoopSystemPromptFlash(hasGuardrails, hasWebSearch, hasTabstack, hasStartingUrl, hasInteractive)
  : buildActionLoopSystemPrompt(hasGuardrails, hasWebSearch, hasTabstack, hasStartingUrl, hasInteractive);
```

(For now, only `flash` and the default differ. A future "frontier" variant could add Browser Use's `<reasoning_rules>`-style structured-thinking scaffolding.)

### D. Surface the chosen variant in telemetry

Add `promptVariant` to the `TASK_SETUP` event payload so logs and eval-judge consumption can correlate variant choice with outcomes.

## Implementation notes

- The model-class detector is necessarily heuristic. Erring on the side of `balanced` (the default prompt) is safest — flash variant should only fire when we're confident.
- Test extensively against custom OpenAI-compatible endpoints (`provider: openai-compatible`): users running local models through them may set arbitrary `modelId` values. The override config option handles this case.
- The flash variant should still include the interactive-mode block when applicable — that's a privacy/security concern, not a "best practice."
- Snapshot tests on the existing default prompt should not change (variant defaults to balanced for known frontier models).
- Verify on a small benchmark: does the flash variant on a flash model match or beat the default prompt's task-completion rate? If it underperforms, the variant needs tuning, not just stripping.

## Acceptance criteria

- `buildActionLoopSystemPromptFlash` exists in `prompts.ts` and produces a noticeably shorter rendered output.
- `classifyModel` correctly identifies common flash models (Haiku, gpt-4.1-mini, Gemini Flash) in tests.
- `WebAgentOptions.promptVariant` override works (`"flash"`, `"balanced"`, `"frontier"`, `"auto"`).
- `TASK_SETUP` event includes the chosen variant.
- Benchmark on at least one small eval set: flash-on-flash-model matches or beats default-on-flash-model.

## Effort estimate

1-2 days. The prompt-stripping work is fast; benchmarking takes a separate eval run.

## Related issues

Pairs naturally with the pre-done verification checklist issue — the checklist should live in the *default* prompt variant; the flash variant gets a condensed form. Pairs with the prompt-caching issue — flash variant has a shorter cacheable prefix, but caching still helps.

## Files likely affected

- `packages/core/src/prompts.ts` (new `buildActionLoopSystemPromptFlash`)
- `packages/core/src/provider.ts` (new `classifyModel`, ModelClass type)
- `packages/core/src/webAgent.ts` (`initializeSystemPromptAndTask`, options surface)
- `packages/core/src/types/` (WebAgentOptions type)
- `packages/core/src/events.ts` (TASK_SETUP payload)
- `packages/core/test/webAgent.test.ts`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add flash-mode system prompt variant for fast/cheap models #434

Current state

The gap

Proposed scope

A. Add a "flash" system prompt variant

B. Add a model-class detector

C. Wire the variant selection

D. Surface the chosen variant in telemetry

Implementation notes

Acceptance criteria

Effort estimate

Related issues

Files likely affected

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add flash-mode system prompt variant for fast/cheap models #434

Description

Current state

The gap

Proposed scope

A. Add a "flash" system prompt variant

B. Add a model-class detector

C. Wire the variant selection

D. Surface the chosen variant in telemetry

Implementation notes

Acceptance criteria

Effort estimate

Related issues

Files likely affected

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions