Skip to content

feat(agent): bcli agent mode — interactive chat REPL over BC verbs (Part 4)#26

Open
igor-ctrl wants to merge 6 commits into
mainfrom
feat/agent-mode
Open

feat(agent): bcli agent mode — interactive chat REPL over BC verbs (Part 4)#26
igor-ctrl wants to merge 6 commits into
mainfrom
feat/agent-mode

Conversation

@igor-ctrl

@igor-ctrl igor-ctrl commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Summary

Adds bcli agent mode — a Claude Code / Codex-style interactive chat REPL where an LLM drives bcli's own verbs (get/query/post/batch/…) as tools. This is Part 4 of the agent-evolution roadmap; Parts 0–3 (ContextBundle, packs, bcli ask, site) shipped to make this a wiring exercise.

Typing bare bcli on a TTY launches the chat TUI; subcommands are unchanged and non-TTY invocations still print help (so scripts/agents that pipe bcli are unaffected).

Architecture

  • AgentSessionBackend protocol emitting an AgentEvent stream (text_delta / tool_call_started / tool_result / awaiting_approval / turn_complete / error), consumed by one Textual TUI — so all backends feel identical. Factory mirrors bcli.ask (_BUILTIN_BACKENDS + module:Class escape hatch + NullAgent fallback).
  • Three backends, all in v1:
    • pydantic-ai — BYOK in-process loop (provider:model strings: Anthropic / OpenAI / local Ollama / OpenAI-compatible base_url).
    • claude-agent-sdk — rides the user's installed Claude Code (in-process @tools, tools=[] strips coding built-ins, can_use_tool permission callback).
    • codexopenai-codex SDK, reusing the existing bcli_mcp server as the tool surface (no new tool code).
  • Single tool definition from bcli describe --format json (the same source bcli_mcp uses), projected three ways.
  • Write safety enforced inside the tool implementations (src/bcli/agent/tools/_impl.py): disable_writes, caution == "high", and production targets emit awaiting_approval, resolved by the Textual approval dialog or /yes. Plan-mode drafts a batch.yaml for review → confirm → batch run.
  • First-run setup wizard (also bcli agent init), consent gate for subscription-auth backends (never the default; persisted with timestamp), per-profile BC.md memory (read-only in v1).

SDK/CLI split preserved: bcli.agent never imports bcli_cli; the agent/Textual/pydantic-ai stack is lazy-loaded so ordinary subcommands import none of it.

New optional extras

[agent-local] (pydantic-ai), [agent-claude-code], [agent-codex], and [agent] meta-extra (adds textual). Backends behind extras fall back to NullAgent + a one-shot warning when the SDK isn't installed.

Test plan

  • uv run pytest tests/1030 passed, 5 skipped (test_agent 61, test_repl 22; no network — pydantic-ai via TestModel, claude-agent-sdk/openai-codex faked in sys.modules).
  • uv run ruff check src/ — clean.
  • Bare bcli piped (non-TTY) prints help, does not launch the REPL (regression-tested).
  • Ordinary subcommand import leaks zero textual/pydantic_ai/bcli.agent/repl modules.
  • Manual (needs real keys / installed binaries): live BYOK turn (Anthropic or local Ollama); Claude Code subscription path + consent prompt; Codex path — including verifying the openai-codex _notification_to_event mapping against a real stream (SDK is beta 0.1.0b3; its API differs from the original plan and the mapping is defensive). See IMPLEMENTATION-SUMMARY.md and docs/agent.md.

Notable deviation

The Codex SDK is openai-codex (import openai_codex, beta 0.1.0b3) with a high-level AsyncCodex().thread_start → thread.turn → stream() API — not the import codex JSON-RPC surface the plan assumed. Implementation targets the real API; [tool.uv] prerelease = "allow" lets the lock resolve the pinned prerelease while core deps stay stable.

…ckends/memory)

Incomplete checkpoint from interrupted run: no tests, extras, or CLI
wiring yet. Imports clean. Continuation will complete Parts 1-4.
…adless run

Finishes the interrupted Part-1 skeleton (4a71910) into a tested,
wired, shippable BYOK agent engine.

- fix(pydantic-ai): import AgentRunResultEvent from pydantic_ai, not
  pydantic_ai.messages (the WIP import would ImportError at turn time).
- wire 'bcli agent' (run + init) into the Typer app; bare-bcli REPL
  entry lands in Part 2.
- repl package __init__ (lazy launch_repl) + console setup wizard
  (_wizard.py): backend detection, [agent] section assembly, keychain
  key storage, subscription-consent hook. Reachable via 'bcli agent
  init' and (Part 2) first-run.
- pyproject extras: agent-local (pydantic-ai-slim[anthropic,openai]
  >=1.107,<2), agent-claude-code, agent-codex, agent meta-extra
  (+textual>=8.2); added to dev.
- tests/test_agent (51): factory dispatch + Null fallback, ToolRegistry
  tier/plan-mode + describe round-trip + bcli_mcp parity, write-safety
  gate matrix (disable_writes/caution=high/production, decline->refusal,
  auto-approve, fail-closed) + draft_batch, read handlers, PydanticAI
  backend event-stream shape via TestModel (no network), wizard logic,
  consent gate, headless 'agent run' + plan-mode resolution.

ruff clean; full suite green (one pre-existing env-pollution failure in
test_context unrelated to this change).
Bare `bcli` on a TTY now opens an interactive agent chat; non-TTY
still prints help (regression-tested) so scripted callers are untouched.

- app.py: no_args_is_help=False + invoke_without_command callback; branch
  on ctx.invoked_subcommand is None — dual-TTY → lazy-import repl, else
  help. Agent stack never imported for ordinary subcommands.
- repl/_app.py: ChatApp (Textual) — scrolling transcript, MarkdownStream
  streaming, ToolCallPanel cards, StatusBar, approval modal; turns run in
  an exclusive worker consuming AgentEvents; long-lived AsyncBCClient +
  AgentRuntime; first-run wizard via run_repl.
- _widgets.py: StatusBar / ToolCallPanel / ApprovalScreen (y/n + buttons,
  resolves the runtime gate future).
- _commands.py: slash parser (/model /profile /company /plan /yes
  /context /clear /help /exit + aliases), pure + testable.
- _plan_mode.py: drafted batch YAML → temp file → gated 'bcli batch run'
  (dry-run then real), same path bcli extract uses.
- tests/test_repl (22): bare-entry non-TTY/TTY regression, slash parsing,
  plan-mode argv + round-trip, wizard config-write, Textual run_test()
  pilots feeding canned AgentEvent streams (text/tool/approval).
- fix(test): pin the text-only pydantic-ai test to call_tools=[] so it
  no longer shells out to a real 'bcli batch' subprocess (that side
  effect was polluting test_context's last-error read).

ruff clean; full suite green (1020 passed, 5 skipped).
Drives the user's installed Claude Code as a harness-owned loop, exposing
bcli's verbs through an in-process SDK MCP server built from the SAME
tools/_impl.py handlers the pydantic-ai backend uses — write safety stays
in one place.

- backends/_claude_sdk.py: ClaudeCodeBackend over ClaudeSDKClient.
  system_prompt carries bcli's prompt; allowed_tools is restricted to
  mcp__bcli__* (built-in coding tools never allowed); can_use_tool is a
  coarse fence (allow bcli tools, deny everything else) on top of the
  per-handler write gate. Handles the documented Python quirk: streaming
  AsyncIterable prompt + dummy PreToolUse hook returning {continue_:True}
  so can_use_tool fires. Translates AssistantMessage/TextBlock/
  ToolUseBlock/ToolResultBlock/ResultMessage → uniform AgentEvents.
- consent flow (_consent.py, already present): claude-code on a
  subscription login requires literal 'yes', persisted via tomlkit; API
  keys never prompt — covered by test_consent.
- [agent-claude-code] extra already declared in Part 1.
- tests/test_agent/test_claude_sdk_backend.py (5): fake claude_agent_sdk
  injected into sys.modules (package not installed) — factory build,
  event translation, can_use_tool fence, dummy hook, bcli-only
  allowed_tools.

ruff clean; agent suite 56 passed.
Drives the user's installed Codex as a harness-owned loop. Codex is an
MCP client, so it consumes bcli's existing bcli_mcp stdio server — zero
new tool code; the write gate runs one layer down in the bcli subprocess
(confirm_write_or_exit + disable_writes) reinforced by codex approval_mode.

DEVIATION (documented): the plan assumed an 'import codex' JSON-RPC
thread/turn/item surface. The actually-published package is openai-codex
(import openai_codex, beta 0.1.0b3): AsyncCodex().thread_start(...) ->
thread.turn(input) -> AsyncTurnHandle.stream() yielding notifications +
a TurnResult. This backend targets that real API; inspected the live
PyPI metadata + GitHub api-reference.md to confirm signatures.

- backends/_codex.py: CodexBackend over AsyncCodex. to_mcp_config()
  registers bcli_mcp (bcli-mcp script or 'python -m bcli_mcp') with
  BCLI_PROFILE env. base_instructions carries bcli's system prompt;
  approval_mode escalates to on_request under production/plan mode
  (defensive enum probing for the beta). Notifications mapped
  best-effort to AgentEvents (assistant text -> text_delta, tool/mcp/
  command items -> tool_call_started); final answer from TurnResult.
- pyproject: [agent-codex] = openai-codex>=0.1.0b3; [tool.uv] prerelease
  = allow so the universal lock resolves the beta's pinned prerelease
  runtime (openai-codex-cli-bin) — core deps still pin stable.
- tests/test_agent/test_codex_backend.py (5): fake openai_codex in
  sys.modules — to_mcp_config, factory build, notification mapping +
  final answer, thread_start config/instructions, production approval
  escalation.

ruff clean; full suite 1030 passed, 5 skipped.
- docs/agent.md: end-to-end guide (quick start, three backends + config,
  credentials + subscription consent gate, write safety + plan mode,
  chat commands, BC.md memory, headless run, engine/renderer
  architecture, live smoke-test checklist).
- docs/command-reference.md: 'agent' section (bare bcli, agent run,
  agent init).
- README: Agent Mode row in the docs table.
- IMPLEMENTATION-SUMMARY.md: rewritten for the agent-mode plan (was a
  stale unrelated summary) — per-part build log, commit list, test
  counts (test_agent 61, test_repl 22, full suite 1030 passed),
  deviations, and manual follow-ups.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant