diff --git a/.github/workflows/check-llms-files.yml b/.github/workflows/check-llms-files.yml new file mode 100644 index 000000000..4b0148cb6 --- /dev/null +++ b/.github/workflows/check-llms-files.yml @@ -0,0 +1,61 @@ +name: Sync llms context files + +on: + schedule: + # Run weekly at 3 AM UTC on Mondays + - cron: '0 3 * * 1' + workflow_dispatch: + +permissions: + contents: write + pull-requests: write + +concurrency: + group: sync-llms-files + cancel-in-progress: true + +jobs: + sync-llms-files: + runs-on: ubuntu-latest + steps: + - name: Checkout docs repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 + + - name: Regenerate llms.txt and llms-full.txt + run: python3 scripts/generate-llms-files.py + + - name: Check for changes + id: detect_changes + shell: bash + run: | + if [[ -n "$(git status --porcelain)" ]]; then + echo "changes=true" >> "$GITHUB_OUTPUT" + else + echo "changes=false" >> "$GITHUB_OUTPUT" + fi + + - name: Create Pull Request + if: steps.detect_changes.outputs.changes == 'true' + uses: peter-evans/create-pull-request@v7 + with: + token: ${{ github.token }} + branch-token: ${{ github.token }} + commit-message: "docs: sync llms context files" + branch: sync-llms-files + branch-suffix: timestamp + delete-branch: true + title: "docs: sync llms context files" + body: | + ## Summary of changes + + This PR updates `llms.txt` and `llms-full.txt` using `scripts/generate-llms-files.py`. + + ### Changes Made + - Regenerated llms context files from the latest documentation + - This is an automated sync from the `sync-llms-files` workflow + + ### Checklist + - [x] I have read and reviewed the documentation changes to the best of my ability. + - [x] If the change is significant, I have run the documentation site locally and confirmed it renders as expected. diff --git a/AGENTS.md b/AGENTS.md index 022e2e0e7..528f9c1e6 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -25,6 +25,30 @@ The site is built with **Mintlify** and deployed automatically by Mintlify on pu - `.agents/skills/` — prompt extensions for agents editing this repo (legacy: `.openhands/skills/`; formerly `microagents`) - `tests/` — pytest checks for docs consistency (notably LLM pricing docs) + +## llms.txt / llms-full.txt (V1-only) + +Mintlify auto-generates `/llms.txt` and `/llms-full.txt`, but this repo **overrides** them by committing +`llms.txt` and `llms-full.txt` at the repo root. + +We do this so LLMs get **V1-only** context while legacy V0 pages remain available for humans. + +- Generator script: `scripts/generate-llms-files.py` +- Sync workflow: `.github/workflows/check-llms-files.yml` runs weekly (and on demand) to open a PR when the files drift. +- Regenerate (recommended): + ```bash + make llms + ``` + Or directly: + ```bash + python3 scripts/generate-llms-files.py + ``` +- Local verify (optional): + ```bash + make llms-check + ``` +- Exclusions: `openhands/usage/v0/` and any `V0*`-prefixed page files. + ## Local development ### Preview the site diff --git a/Makefile b/Makefile new file mode 100644 index 000000000..765994454 --- /dev/null +++ b/Makefile @@ -0,0 +1,12 @@ +.PHONY: llms llms-check + +# Regenerate the Mintlify llms context files (V1-only override). +# +# See: scripts/generate-llms-files.py +llms: + python3 scripts/generate-llms-files.py + +# Regenerate and fail if llms files changed (useful for local verification). +llms-check: + python3 scripts/generate-llms-files.py + git diff --exit-code llms.txt llms-full.txt diff --git a/llms-full.txt b/llms-full.txt new file mode 100644 index 000000000..10aea661b --- /dev/null +++ b/llms-full.txt @@ -0,0 +1,34371 @@ +# OpenHands Docs + +> Consolidated documentation context for LLMs (V1-only). Legacy V0 docs pages are intentionally excluded. + +## OpenHands Software Agent SDK + +### Software Agent SDK +Source: https://docs.openhands.dev/sdk.md + +The OpenHands Software Agent SDK is a set of Python and REST APIs for building **agents that work with code**. + +You can use the OpenHands Software Agent SDK for: + +- One-off tasks, like building a README for your repo +- Routine maintenance tasks, like updating dependencies +- Major tasks that involve multiple agents, like refactors and rewrites + +You can even use the SDK to build new developer experiences—it’s the engine behind the [OpenHands CLI](/openhands/usage/cli/quick-start) and [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). + +Get started with some examples or keep reading to learn more. + +## Features + + + + A unified Python API that enables you to run agents locally or in the cloud, define custom agent behaviors, and create custom tools. + + + Ready-to-use tools for executing Bash commands, editing files, browsing the web, integrating with MCP, and more. + + + A production-ready server that runs agents anywhere, including Docker and Kubernetes, while connecting seamlessly to the Python API. + + + +## Why OpenHands Software Agent SDK? + +### Emphasis on coding + +While other agent SDKs (e.g. [LangChain](https://python.langchain.com/docs/tutorials/agents/)) are focused on more general use cases, like delivering chat-based support or automating back-office tasks, OpenHands is purpose-built for software engineering. + +While some folks do use OpenHands to solve more general tasks (code is a powerful tool!), most of us use OpenHands to work with code. + +### State-of-the-Art Performance + +OpenHands is a top performer across a wide variety of benchmarks, including SWE-bench, SWT-bench, and multi-SWE-bench. The SDK includes a number of state-of-the-art agentic features developed by our research team, including: + +- Task planning and decomposition +- Automatic context compression +- Security analysis +- Strong agent-computer interfaces + +OpenHands has attracted researchers from a wide variety of academic institutions, and is [becoming the preferred harness](https://x.com/Alibaba_Qwen/status/1947766835023335516) for evaluating LLMs on coding tasks. + +### Free and Open Source + +OpenHands is also the leading open source framework for coding agents. It’s MIT-licensed, and can work with any LLM—including big proprietary LLMs like Claude and OpenAI, as well as open source LLMs like Qwen and Devstral. + +Other SDKs (e.g. [Claude Code](https://github.com/anthropics/claude-agent-sdk-python)) are proprietary and lock you into a particular model. Given how quickly models are evolving, it’s best to stay model-agnostic! + +## Get Started + + + + Install the SDK, run your first agent, and explore the guides. + + + +## Learn the SDK + + + + Understand the SDK's architecture: agents, tools, workspaces, and more. + + + Explore the complete SDK API and source code. + + + +## Build with Examples + + + + Build local agents with custom tools and capabilities. + + + Run agents on remote servers with Docker sandboxing. + + + Automate repository tasks with agent-powered workflows. + + + +## Community + + + + Connect with the OpenHands community on Slack. + + + Contribute to the SDK or report issues on GitHub. + + + +### openhands.sdk.agent +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.agent.md + +### class Agent + +Bases: `CriticMixin`, [`AgentBase`](#class-agentbase) + +Main agent implementation for OpenHands. + +The Agent class provides the core functionality for running AI agents that can +interact with tools, process messages, and execute actions. It inherits from +AgentBase and implements the agent execution logic. Critic-related functionality +is provided by CriticMixin. + +#### Example + +```pycon +>>> from openhands.sdk import LLM, Agent, Tool +>>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) +>>> tools = [Tool(name="TerminalTool"), Tool(name="FileEditorTool")] +>>> agent = Agent(llm=llm, tools=tools) +``` + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### Methods + +#### init_state() + +Initialize conversation state. + +Invariants enforced by this method: +- If a SystemPromptEvent is already present, it must be within the first 3 + + events (index 0 or 1 in practice; index 2 is included in the scan window + to detect a user message appearing before the system prompt). +- A user MessageEvent should not appear before the SystemPromptEvent. + +These invariants keep event ordering predictable for downstream components +(condenser, UI, etc.) and also prevent accidentally materializing the full +event history during initialization. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### step() + +Taking a step in the conversation. + +Typically this involves: +1. Making a LLM call +2. Executing the tool +3. Updating the conversation state with + + LLM calls (role=”assistant”) and tool results (role=”tool”) + +4.1 If conversation is finished, set state.execution_status to FINISHED +4.2 Otherwise, just return, Conversation will kick off the next step + +If the underlying LLM supports streaming, partial deltas are forwarded to +`on_token` before the full response is returned. + +NOTE: state will be mutated in-place. + +### class AgentBase + +Bases: `DiscriminatedUnionMixin`, `ABC` + +Abstract base class for OpenHands agents. + +Agents are stateless and should be fully defined by their configuration. +This base class provides the common interface and functionality that all +agent implementations must follow. + + +#### Properties + +- `agent_context`: AgentContext | None +- `condenser`: CondenserBase | None +- `critic`: CriticBase | None +- `dynamic_context`: str | None + Get the dynamic per-conversation context. + This returns the context that varies between conversations, such as: + - Repository information and skills + - Runtime information (hosts, working directory) + - User-specific secrets and settings + - Conversation instructions + This content should NOT be included in the cached system prompt to enable + cross-conversation cache sharing. Instead, it is sent as a second content + block (without a cache marker) inside the system message. + * Returns: + The dynamic context string, or None if no context is configured. +- `filter_tools_regex`: str | None +- `include_default_tools`: list[str] +- `llm`: LLM +- `mcp_config`: dict[str, Any] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: str + Returns the name of the Agent. +- `prompt_dir`: str + Returns the directory where this class’s module file is located. +- `security_policy_filename`: str +- `static_system_message`: str + Compute the static portion of the system message. + This returns only the base system prompt template without any dynamic + per-conversation context. This static portion can be cached and reused + across conversations for better prompt caching efficiency. + * Returns: + The rendered system prompt template without dynamic context. +- `system_message`: str + Return the combined system message (static + dynamic). +- `system_prompt_filename`: str +- `system_prompt_kwargs`: dict[str, object] +- `tools`: list[Tool] +- `tools_map`: dictstr, [ToolDefinition] + Get the initialized tools map. + :raises RuntimeError: If the agent has not been initialized. + +#### Methods + +#### get_all_llms() + +Recursively yield unique base-class LLM objects reachable from self. + +- Returns actual object references (not copies). +- De-dupes by id(LLM). +- Cycle-safe via a visited set for all traversed objects. +- Only yields objects whose type is exactly LLM (no subclasses). +- Does not handle dataclasses. + +#### init_state() + +Initialize the empty conversation state to prepare the agent for user +messages. + +Typically this involves adding system message + +NOTE: state will be mutated in-place. + +#### model_dump_succint() + +Like model_dump, but excludes None fields by default. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### abstractmethod step() + +Taking a step in the conversation. + +Typically this involves: +1. Making a LLM call +2. Executing the tool +3. Updating the conversation state with + + LLM calls (role=”assistant”) and tool results (role=”tool”) + +4.1 If conversation is finished, set state.execution_status to FINISHED +4.2 Otherwise, just return, Conversation will kick off the next step + +If the underlying LLM supports streaming, partial deltas are forwarded to +`on_token` before the full response is returned. + +NOTE: state will be mutated in-place. + +#### Deprecated +Deprecated since version 1.11.0: Use [`static_system_message`](#class-static_system_message) for the cacheable system prompt and +[`dynamic_context`](#class-dynamic_context) for per-conversation content. This separation +enables cross-conversation prompt caching. Will be removed in 1.16.0. + +#### WARNING +Using this property DISABLES cross-conversation prompt caching because +it combines static and dynamic content into a single string. Use +[`static_system_message`](#class-static_system_message) and [`dynamic_context`](#class-dynamic_context) separately +to enable caching. + +#### Deprecated +Deprecated since version 1.11.0: This will be removed in 1.16.0. Use static_system_message for the cacheable system prompt and dynamic_context for per-conversation content. Using system_message DISABLES cross-conversation prompt caching because it combines static and dynamic content into a single string. + +#### verify() + +Verify that we can resume this agent from persisted state. + +We do not merge configuration between persisted and runtime Agent +instances. Instead, we verify compatibility requirements and then +continue with the runtime-provided Agent. + +Compatibility requirements: +- Agent class/type must match. +- Tools must match exactly (same tool names). + +Tools are part of the system prompt and cannot be changed mid-conversation. +To use different tools, start a new conversation or use conversation forking +(see [https://github.com/OpenHands/OpenHands/issues/8560](https://github.com/OpenHands/OpenHands/issues/8560)). + +All other configuration (LLM, agent_context, condenser, etc.) can be +freely changed between sessions. + +* Parameters: + * `persisted` – The agent loaded from persisted state. + * `events` – Unused, kept for API compatibility. +* Returns: + This runtime agent (self) if verification passes. +* Raises: + `ValueError` – If agent class or tools don’t match. + +### openhands.sdk.conversation +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.conversation.md + +### class BaseConversation + +Bases: `ABC` + +Abstract base class for conversation implementations. + +This class defines the interface that all conversation implementations must follow. +Conversations manage the interaction between users and agents, handling message +exchange, execution control, and state management. + + +#### Properties + +- `confirmation_policy_active`: bool +- `conversation_stats`: ConversationStats +- `id`: UUID +- `is_confirmation_mode_active`: bool + Check if confirmation mode is active. + Returns True if BOTH conditions are met: + 1. The conversation state has a security analyzer set (not None) + 2. The confirmation policy is active +- `state`: ConversationStateProtocol + +#### Methods + +#### __init__() + +Initialize the base conversation with span tracking. + +#### abstractmethod ask_agent() + +Ask the agent a simple, stateless question and get a direct LLM response. + +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. + +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent + +#### abstractmethod close() + +#### static compose_callbacks() + +Compose multiple callbacks into a single callback function. + +* Parameters: + `callbacks` – An iterable of callback functions +* Returns: + A single callback function that calls all provided callbacks + +#### abstractmethod condense() + +Force condensation of the conversation history. + +This method uses the existing condensation request pattern to trigger +condensation. It adds a CondensationRequest event to the conversation +and forces the agent to take a single step to process it. + +The condensation will be applied immediately and will modify the conversation +state by adding a condensation event to the history. + +* Raises: + `ValueError` – If no condenser is configured or the condenser doesn’t + handle condensation requests. + +#### abstractmethod execute_tool() + +Execute a tool directly without going through the agent loop. + +This method allows executing tools before or outside of the normal +conversation.run() flow. It handles agent initialization automatically, +so tools can be executed before the first run() call. + +Note: This method bypasses the agent loop, including confirmation +policies and security analyzer checks. Callers are responsible for +applying any safeguards before executing potentially destructive tools. + +This is useful for: +- Pre-run setup operations (e.g., indexing repositories) +- Manual tool execution for environment setup +- Testing tool behavior outside the agent loop + +* Parameters: + * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) + * `action` – The action to pass to the tool executor +* Returns: + The observation returned by the tool execution +* Raises: + * `KeyError` – If the tool is not found in the agent’s tools + * `NotImplementedError` – If the tool has no executor + +#### abstractmethod generate_title() + +Generate a title for the conversation based on the first user message. + +* Parameters: + * `llm` – Optional LLM to use for title generation. If not provided, + uses the agent’s LLM. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. +* Raises: + `ValueError` – If no user messages are found in the conversation. + +#### static get_persistence_dir() + +Get the persistence directory for the conversation. + +* Parameters: + * `persistence_base_dir` – Base directory for persistence. Can be a string + path or Path object. + * `conversation_id` – Unique conversation ID. +* Returns: + String path to the conversation-specific persistence directory. + Always returns a normalized string path even if a Path was provided. + +#### abstractmethod pause() + +#### abstractmethod reject_pending_actions() + +#### abstractmethod run() + +Execute the agent to process messages and perform actions. + +This method runs the agent until it finishes processing the current +message or reaches the maximum iteration limit. + +#### abstractmethod send_message() + +Send a message to the agent. + +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. + +#### abstractmethod set_confirmation_policy() + +Set the confirmation policy for the conversation. + +#### abstractmethod set_security_analyzer() + +Set the security analyzer for the conversation. + +#### abstractmethod update_secrets() + +### class Conversation + +### class Conversation + +Bases: `object` + +Factory class for creating conversation instances with OpenHands agents. + +This factory automatically creates either a LocalConversation or RemoteConversation +based on the workspace type provided. LocalConversation runs the agent locally, +while RemoteConversation connects to a remote agent server. + +* Returns: + LocalConversation if workspace is local, RemoteConversation if workspace + is remote. + +#### Example + +```pycon +>>> from openhands.sdk import LLM, Agent, Conversation +>>> from openhands.sdk.plugin import PluginSource +>>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) +>>> agent = Agent(llm=llm, tools=[]) +>>> conversation = Conversation( +... agent=agent, +... workspace="./workspace", +... plugins=[PluginSource(source="github:org/security-plugin", ref="v1.0")], +... ) +>>> conversation.send_message("Hello!") +>>> conversation.run() +``` + +### class ConversationExecutionStatus + +Bases: `str`, `Enum` + +Enum representing the current execution state of the conversation. + +#### Methods + +#### DELETING = 'deleting' + +#### ERROR = 'error' + +#### FINISHED = 'finished' + +#### IDLE = 'idle' + +#### PAUSED = 'paused' + +#### RUNNING = 'running' + +#### STUCK = 'stuck' + +#### WAITING_FOR_CONFIRMATION = 'waiting_for_confirmation' + +#### is_terminal() + +Check if this status represents a terminal state. + +Terminal states indicate the run has completed and the agent is no longer +actively processing. These are: FINISHED, ERROR, STUCK. + +Note: IDLE is NOT a terminal state - it’s the initial state of a conversation +before any run has started. Including IDLE would cause false positives when +the WebSocket delivers the initial state update during connection. + +* Returns: + True if this is a terminal status, False otherwise. + +### class ConversationState + +Bases: `OpenHandsModel` + + +#### Properties + +- `activated_knowledge_skills`: list[str] +- `agent`: AgentBase +- `agent_state`: dict[str, Any] +- `blocked_actions`: dict[str, str] +- `blocked_messages`: dict[str, str] +- `confirmation_policy`: ConfirmationPolicyBase +- `env_observation_persistence_dir`: str | None + Directory for persisting environment observation files. +- `events`: [EventLog](#class-eventlog) +- `execution_status`: [ConversationExecutionStatus](#class-conversationexecutionstatus) +- `id`: UUID +- `max_iterations`: int +- `persistence_dir`: str | None +- `secret_registry`: [SecretRegistry](#class-secretregistry) +- `security_analyzer`: SecurityAnalyzerBase | None +- `stats`: ConversationStats +- `stuck_detection`: bool +- `workspace`: BaseWorkspace + +#### Methods + +#### acquire() + +Acquire the lock. + +* Parameters: + * `blocking` – If True, block until lock is acquired. If False, return + immediately. + * `timeout` – Maximum time to wait for lock (ignored if blocking=False). + -1 means wait indefinitely. +* Returns: + True if lock was acquired, False otherwise. + +#### block_action() + +Persistently record a hook-blocked action. + +#### block_message() + +Persistently record a hook-blocked user message. + +#### classmethod create() + +Create a new conversation state or resume from persistence. + +This factory method handles both new conversation creation and resumption +from persisted state. + +New conversation: +The provided Agent is used directly. Pydantic validation happens via the +cls() constructor. + +Restored conversation: +The provided Agent is validated against the persisted agent using +agent.load(). Tools must match (they may have been used in conversation +history), but all other configuration can be freely changed: LLM, +agent_context, condenser, system prompts, etc. + +* Parameters: + * `id` – Unique conversation identifier + * `agent` – The Agent to use (tools must match persisted on restore) + * `workspace` – Working directory for agent operations + * `persistence_dir` – Directory for persisting state and events + * `max_iterations` – Maximum iterations per run + * `stuck_detection` – Whether to enable stuck detection + * `cipher` – Optional cipher for encrypting/decrypting secrets in + persisted state. If provided, secrets are encrypted when + saving and decrypted when loading. If not provided, secrets + are redacted (lost) on serialization. +* Returns: + ConversationState ready for use +* Raises: + * `ValueError` – If conversation ID or tools mismatch on restore + * `ValidationError` – If agent or other fields fail Pydantic validation + +#### static get_unmatched_actions() + +Find actions in the event history that don’t have matching observations. + +This method identifies ActionEvents that don’t have corresponding +ObservationEvents or UserRejectObservations, which typically indicates +actions that are pending confirmation or execution. + +* Parameters: + `events` – List of events to search through +* Returns: + List of ActionEvent objects that don’t have corresponding observations, + in chronological order + +#### locked() + +Return True if the lock is currently held by any thread. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### owned() + +Return True if the lock is currently held by the calling thread. + +#### pop_blocked_action() + +Remove and return a hook-blocked action reason, if present. + +#### pop_blocked_message() + +Remove and return a hook-blocked message reason, if present. + +#### release() + +Release the lock. + +* Raises: + `RuntimeError` – If the current thread doesn’t own the lock. + +#### set_on_state_change() + +Set a callback to be called when state changes. + +* Parameters: + `callback` – A function that takes an Event (ConversationStateUpdateEvent) + or None to remove the callback + +### class ConversationVisualizerBase + +Bases: `ABC` + +Base class for conversation visualizers. + +This abstract base class defines the interface that all conversation visualizers +must implement. Visualizers can be created before the Conversation is initialized +and will be configured with the conversation state automatically. + +The typical usage pattern: +1. Create a visualizer instance: + + viz = MyVisualizer() +1. Pass it to Conversation: conv = Conversation(agent, visualizer=viz) +2. Conversation automatically calls viz.initialize(state) to attach the state + +You can also pass the uninstantiated class if you don’t need extra args +: for initialization, and Conversation will create it: + : conv = Conversation(agent, visualizer=MyVisualizer) + +Conversation will then calls MyVisualizer() followed by initialize(state) + + +#### Properties + +- `conversation_stats`: ConversationStats | None + Get conversation stats from the state. + +#### Methods + +#### __init__() + +Initialize the visualizer base. + +#### create_sub_visualizer() + +Create a visualizer for a sub-agent during delegation. + +Override this method to support sub-agent visualization in multi-agent +delegation scenarios. The sub-visualizer will be used to display events +from the spawned sub-agent. + +By default, returns None which means sub-agents will not have visualization. +Subclasses that support delegation (like DelegationVisualizer) should +override this method to create appropriate sub-visualizers. + +* Parameters: + `agent_id` – The identifier of the sub-agent being spawned +* Returns: + A visualizer instance for the sub-agent, or None if sub-agent + visualization is not supported + +#### final initialize() + +Initialize the visualizer with conversation state. + +This method is called by Conversation after the state is created, +allowing the visualizer to access conversation stats and other +state information. + +Subclasses should not override this method, to ensure the state is set. + +* Parameters: + `state` – The conversation state object + +#### abstractmethod on_event() + +Handle a conversation event. + +This method is called for each event in the conversation and should +implement the visualization logic. + +* Parameters: + `event` – The event to visualize + +### class DefaultConversationVisualizer + +Bases: [`ConversationVisualizerBase`](#class-conversationvisualizerbase) + +Handles visualization of conversation events with Rich formatting. + +Provides Rich-formatted output with semantic dividers and complete content display. + +#### Methods + +#### __init__() + +Initialize the visualizer. + +* Parameters: + * `highlight_regex` – Dictionary mapping regex patterns to Rich color styles + for highlighting keywords in the visualizer. + For example: (configuration object) + * `skip_user_messages` – If True, skip displaying user messages. Useful for + scenarios where user input is not relevant to show. + +#### on_event() + +Main event handler that displays events with Rich formatting. + +### class EventLog + +Bases: [`EventsListBase`](#class-eventslistbase) + +Persistent event log with locking for concurrent writes. + +This class provides thread-safe and process-safe event storage using +the FileStore’s locking mechanism. Events are persisted to disk and +can be accessed by index or event ID. + +#### Methods + +#### NOTE +For LocalFileStore, file locking via flock() does NOT work reliably +on NFS mounts or network filesystems. Users deploying with shared +storage should use alternative coordination mechanisms. + +#### __init__() + +#### append() + +Append an event with locking for thread/process safety. + +* Raises: + * `TimeoutError` – If the lock cannot be acquired within LOCK_TIMEOUT_SECONDS. + * `ValueError` – If an event with the same ID already exists. + +#### get_id() + +Return the event_id for a given index. + +#### get_index() + +Return the integer index for a given event_id. + +### class EventsListBase + +Bases: `Sequence`[`Event`], `ABC` + +Abstract base class for event lists that can be appended to. + +This provides a common interface for both local EventLog and remote +RemoteEventsList implementations, avoiding circular imports in protocols. + +#### Methods + +#### abstractmethod append() + +Add a new event to the list. + +### class LocalConversation + +Bases: [`BaseConversation`](#class-baseconversation) + + +#### Properties + +- `agent`: AgentBase +- `delete_on_close`: bool = True +- `id`: UUID + Get the unique ID of the conversation. +- `llm_registry`: LLMRegistry +- `max_iteration_per_run`: int +- `resolved_plugins`: list[ResolvedPluginSource] | None + Get the resolved plugin sources after plugins are loaded. + Returns None if plugins haven’t been loaded yet, or if no plugins + were specified. Use this for persistence to ensure conversation + resume uses the exact same plugin versions. +- `state`: [ConversationState](#class-conversationstate) + Get the conversation state. + It returns a protocol that has a subset of ConversationState methods + and properties. We will have the ability to access the same properties + of ConversationState on a remote conversation object. + But we won’t be able to access methods that mutate the state. +- `stuck_detector`: [StuckDetector](#class-stuckdetector) | None + Get the stuck detector instance if enabled. +- `workspace`: LocalWorkspace + +#### Methods + +#### __init__() + +Initialize the conversation. + +* Parameters: + * `agent` – The agent to use for the conversation. + * `workspace` – Working directory for agent operations and tool execution. + Can be a string path, Path object, or LocalWorkspace instance. + * `plugins` – Optional list of plugins to load. Each plugin is specified + with a source (github:owner/repo, git URL, or local path), + optional ref (branch/tag/commit), and optional repo_path for + monorepos. Plugins are loaded in order with these merge + semantics: skills override by name (last wins), MCP config + override by key (last wins), hooks concatenate (all run). + * `persistence_dir` – Directory for persisting conversation state and events. + Can be a string path or Path object. + * `conversation_id` – Optional ID for the conversation. If provided, will + be used to identify the conversation. The user might want to + suffix their persistent filestore with this ID. + * `callbacks` – Optional list of callback functions to handle events + * `token_callbacks` – Optional list of callbacks invoked for streaming deltas + * `hook_config` – Optional hook configuration to auto-wire session hooks. + If plugins are loaded, their hooks are combined with this config. + * `max_iteration_per_run` – Maximum number of iterations per run + * `visualizer` – + + Visualization configuration. Can be: + - ConversationVisualizerBase subclass: Class to instantiate + > (default: ConversationVisualizer) + - ConversationVisualizerBase instance: Use custom visualizer + - None: No visualization + * `stuck_detection` – Whether to enable stuck detection + * `stuck_detection_thresholds` – Optional configuration for stuck detection + thresholds. Can be a StuckDetectionThresholds instance or + a dict with keys: ‘action_observation’, ‘action_error’, + ‘monologue’, ‘alternating_pattern’. Values are integers + representing the number of repetitions before triggering. + * `cipher` – Optional cipher for encrypting/decrypting secrets in persisted + state. If provided, secrets are encrypted when saving and + decrypted when loading. If not provided, secrets are redacted + (lost) on serialization. + +#### ask_agent() + +Ask the agent a simple, stateless question and get a direct LLM response. + +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. + +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent + +#### close() + +Close the conversation and clean up all tool executors. + +#### condense() + +Synchronously force condense the conversation history. + +If the agent is currently running, condense() will wait for the +ongoing step to finish before proceeding. + +Raises ValueError if no compatible condenser exists. + +#### property conversation_stats + +#### execute_tool() + +Execute a tool directly without going through the agent loop. + +This method allows executing tools before or outside of the normal +conversation.run() flow. It handles agent initialization automatically, +so tools can be executed before the first run() call. + +Note: This method bypasses the agent loop, including confirmation +policies and security analyzer checks. Callers are responsible for +applying any safeguards before executing potentially destructive tools. + +This is useful for: +- Pre-run setup operations (e.g., indexing repositories) +- Manual tool execution for environment setup +- Testing tool behavior outside the agent loop + +* Parameters: + * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) + * `action` – The action to pass to the tool executor +* Returns: + The observation returned by the tool execution +* Raises: + * `KeyError` – If the tool is not found in the agent’s tools + * `NotImplementedError` – If the tool has no executor + +#### generate_title() + +Generate a title for the conversation based on the first user message. + +* Parameters: + * `llm` – Optional LLM to use for title generation. If not provided, + uses self.agent.llm. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. +* Raises: + `ValueError` – If no user messages are found in the conversation. + +#### pause() + +Pause agent execution. + +This method can be called from any thread to request that the agent +pause execution. The pause will take effect at the next iteration +of the run loop (between agent steps). + +Note: If called during an LLM completion, the pause will not take +effect until the current LLM call completes. + +#### reject_pending_actions() + +Reject all pending actions from the agent. + +This is a non-invasive method to reject actions between run() calls. +Also clears the agent_waiting_for_confirmation flag. + +#### run() + +Runs the conversation until the agent finishes. + +In confirmation mode: +- First call: creates actions but doesn’t execute them, stops and waits +- Second call: executes pending actions (implicit confirmation) + +In normal mode: +- Creates and executes actions immediately + +Can be paused between steps + +#### send_message() + +Send a message to the agent. + +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. + +#### set_confirmation_policy() + +Set the confirmation policy and store it in conversation state. + +#### set_security_analyzer() + +Set the security analyzer for the conversation. + +#### update_secrets() + +Add secrets to the conversation. + +* Parameters: + `secrets` – Dictionary mapping secret keys to values or no-arg callables. + SecretValue = str | Callable[[], str]. Callables are invoked lazily + when a command references the secret key. + +### class RemoteConversation + +Bases: [`BaseConversation`](#class-baseconversation) + + +#### Properties + +- `agent`: AgentBase +- `delete_on_close`: bool = False +- `id`: UUID +- `max_iteration_per_run`: int +- `state`: RemoteState + Access to remote conversation state. +- `workspace`: RemoteWorkspace + +#### Methods + +#### __init__() + +Remote conversation proxy that talks to an agent server. + +* Parameters: + * `agent` – Agent configuration (will be sent to the server) + * `workspace` – The working directory for agent operations and tool execution. + * `plugins` – Optional list of plugins to load on the server. Each plugin + is a PluginSource specifying source, ref, and repo_path. + * `conversation_id` – Optional existing conversation id to attach to + * `callbacks` – Optional callbacks to receive events (not yet streamed) + * `max_iteration_per_run` – Max iterations configured on server + * `stuck_detection` – Whether to enable stuck detection on server + * `stuck_detection_thresholds` – Optional configuration for stuck detection + thresholds. Can be a StuckDetectionThresholds instance or + a dict with keys: ‘action_observation’, ‘action_error’, + ‘monologue’, ‘alternating_pattern’. Values are integers + representing the number of repetitions before triggering. + * `hook_config` – Optional hook configuration for session hooks + * `visualizer` – + + Visualization configuration. Can be: + - ConversationVisualizerBase subclass: Class to instantiate + > (default: ConversationVisualizer) + - ConversationVisualizerBase instance: Use custom visualizer + - None: No visualization + * `secrets` – Optional secrets to initialize the conversation with + +#### ask_agent() + +Ask the agent a simple, stateless question and get a direct LLM response. + +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. + +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent + +#### close() + +Close the conversation and clean up resources. + +Note: We don’t close self._client here because it’s shared with the workspace. +The workspace owns the client and will close it during its own cleanup. +Closing it here would prevent the workspace from making cleanup API calls. + +#### condense() + +Force condensation of the conversation history. + +This method sends a condensation request to the remote agent server. +The server will use the existing condensation request pattern to trigger +condensation if a condenser is configured and handles condensation requests. + +The condensation will be applied on the server side and will modify the +conversation state by adding a condensation event to the history. + +* Raises: + `HTTPError` – If the server returns an error (e.g., no condenser configured). + +#### property conversation_stats + +#### execute_tool() + +Execute a tool directly without going through the agent loop. + +Note: This method is not yet supported for RemoteConversation. +Tool execution for remote conversations happens on the server side +during the normal agent loop. + +* Parameters: + * `tool_name` – The name of the tool to execute + * `action` – The action to pass to the tool executor +* Raises: + `NotImplementedError` – Always, as this feature is not yet supported + for remote conversations. + +#### generate_title() + +Generate a title for the conversation based on the first user message. + +* Parameters: + * `llm` – Optional LLM to use for title generation. If provided, its usage_id + will be sent to the server. If not provided, uses the agent’s LLM. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. + +#### pause() + +#### reject_pending_actions() + +#### run() + +Trigger a run on the server. + +* Parameters: + * `blocking` – If True (default), wait for the run to complete by polling + the server. If False, return immediately after triggering the run. + * `poll_interval` – Time in seconds between status polls (only used when + blocking=True). Default is 1.0 second. + * `timeout` – Maximum time in seconds to wait for the run to complete + (only used when blocking=True). Default is 3600 seconds. +* Raises: + `ConversationRunError` – If the run fails or times out. + +#### send_message() + +Send a message to the agent. + +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. + +#### set_confirmation_policy() + +Set the confirmation policy for the conversation. + +#### set_security_analyzer() + +Set the security analyzer for the remote conversation. + +#### property stuck_detector + +Stuck detector for compatibility. +Not implemented for remote conversations. + +#### update_secrets() + +### class SecretRegistry + +Bases: `OpenHandsModel` + +Manages secrets and injects them into bash commands when needed. + +The secret registry stores a mapping of secret keys to SecretSources +that retrieve the actual secret values. When a bash command is about to be +executed, it scans the command for any secret keys and injects the corresponding +environment variables. + +Secret sources will redact / encrypt their sensitive values as appropriate when +serializing, depending on the content of the context. If a context is present +and contains a ‘cipher’ object, this is used for encryption. If it contains a +boolean ‘expose_secrets’ flag set to True, secrets are dunped in plain text. +Otherwise secrets are redacted. + +Additionally, it tracks the latest exported values to enable consistent masking +even when callable secrets fail on subsequent calls. + + +#### Properties + +- `secret_sources`: dict[str, SecretSource] + +#### Methods + +#### find_secrets_in_text() + +Find all secret keys mentioned in the given text. + +* Parameters: + `text` – The text to search for secret keys +* Returns: + Set of secret keys found in the text + +#### get_secrets_as_env_vars() + +Get secrets that should be exported as environment variables for a command. + +* Parameters: + `command` – The bash command to check for secret references +* Returns: + Dictionary of environment variables to export (key -> value) + +#### mask_secrets_in_output() + +Mask secret values in the given text. + +This method uses both the current exported values and attempts to get +fresh values from callables to ensure comprehensive masking. + +* Parameters: + `text` – The text to mask secrets in +* Returns: + Text with secret values replaced by `` + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### update_secrets() + +Add or update secrets in the manager. + +* Parameters: + `secrets` – Dictionary mapping secret keys to either string values + or callable functions that return string values + +### class StuckDetector + +Bases: `object` + +Detects when an agent is stuck in repetitive or unproductive patterns. + +This detector analyzes the conversation history to identify various stuck patterns: +1. Repeating action-observation cycles +2. Repeating action-error cycles +3. Agent monologue (repeated messages without user input) +4. Repeating alternating action-observation patterns +5. Context window errors indicating memory issues + + +#### Properties + +- `action_error_threshold`: int +- `action_observation_threshold`: int +- `alternating_pattern_threshold`: int +- `monologue_threshold`: int +- `state`: [ConversationState](#class-conversationstate) +- `thresholds`: StuckDetectionThresholds + +#### Methods + +#### __init__() + +#### is_stuck() + +Check if the agent is currently stuck. + +Note: To avoid materializing potentially large file-backed event histories, +only the last MAX_EVENTS_TO_SCAN_FOR_STUCK_DETECTION events are analyzed. +If a user message exists within this window, only events after it are checked. +Otherwise, all events in the window are analyzed. + +#### __init__() + +### openhands.sdk.event +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.event.md + +### class ActionEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + + +#### Properties + +- `action`: Action | None +- `critic_result`: CriticResult | None +- `llm_response_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `reasoning_content`: str | None +- `responses_reasoning_item`: ReasoningItemModel | None +- `security_risk`: SecurityRisk +- `source`: Literal['agent', 'user', 'environment'] +- `summary`: str | None +- `thinking_blocks`: list[ThinkingBlock | RedactedThinkingBlock] +- `thought`: Sequence[TextContent] +- `tool_call`: MessageToolCall +- `tool_call_id`: str +- `tool_name`: str +- `visualize`: Text + Return Rich Text representation of this action event. + +#### Methods + +#### to_llm_message() + +Individual message - may be incomplete for multi-action batches + +### class AgentErrorEvent + +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) + +Error triggered by the agent. + +Note: This event should not contain model “thought” or “reasoning_content”. It +represents an error produced by the agent/scaffold, not model output. + + +#### Properties + +- `error`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `visualize`: Text + Return Rich Text representation of this agent error event. + +#### Methods + +#### to_llm_message() + +### class Condensation + +Bases: [`Event`](#class-event) + +This action indicates a condensation of the conversation history is happening. + + +#### Properties + +- `forgotten_event_ids`: list[[EventID](#class-eventid)] +- `has_summary_metadata`: bool + Checks if both summary and summary_offset are present. +- `llm_response_id`: [EventID](#class-eventid) +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `summary`: str | None +- `summary_event`: [CondensationSummaryEvent](#class-condensationsummaryevent) + Generates a CondensationSummaryEvent. + Since summary events are not part of the main event store and are generated + dynamically, this property ensures the created event has a unique and consistent + ID based on the condensation event’s ID. + * Raises: + `ValueError` – If no summary is present. +- `summary_offset`: int | None +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. + +#### Methods + +#### apply() + +Applies the condensation to a list of events. + +This method removes events that are marked to be forgotten and returns a new +list of events. If the summary metadata is present (both summary and offset), +the corresponding CondensationSummaryEvent will be inserted at the specified +offset _after_ the forgotten events have been removed. + +### class CondensationRequest + +Bases: [`Event`](#class-event) + +This action is used to request a condensation of the conversation history. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. + +#### Methods + +#### action + +The action type, namely ActionType.CONDENSATION_REQUEST. + +* Type: + str + +### class CondensationSummaryEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + +This event represents a summary generated by a condenser. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `summary`: str + The summary text. + +#### Methods + +#### to_llm_message() + +### class ConversationStateUpdateEvent + +Bases: [`Event`](#class-event) + +Event that contains conversation state updates. + +This event is sent via websocket whenever the conversation state changes, +allowing remote clients to stay in sync without making REST API calls. + +All fields are serialized versions of the corresponding ConversationState fields +to ensure compatibility with websocket transmission. + + +#### Properties + +- `key`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `value`: Any + +#### Methods + +#### classmethod from_conversation_state() + +Create a state update event from a ConversationState object. + +This creates an event containing a snapshot of important state fields. + +* Parameters: + * `state` – The ConversationState to serialize + * `conversation_id` – The conversation ID for the event +* Returns: + A ConversationStateUpdateEvent with serialized state data + +#### classmethod validate_key() + +#### classmethod validate_value() + +### class Event + +Bases: `DiscriminatedUnionMixin`, `ABC` + +Base class for all events. + + +#### Properties + +- `id`: str +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `timestamp`: str +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. +### class LLMCompletionLogEvent + +Bases: [`Event`](#class-event) + +Event containing LLM completion log data. + +When an LLM is configured with log_completions=True in a remote conversation, +this event streams the completion log data back to the client through WebSocket +instead of writing it to a file inside the Docker container. + + +#### Properties + +- `filename`: str +- `log_data`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `model_name`: str +- `source`: Literal['agent', 'user', 'environment'] +- `usage_id`: str +### class LLMConvertibleEvent + +Bases: [`Event`](#class-event), `ABC` + +Base class for events that can be converted to LLM messages. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### Methods + +#### static events_to_messages() + +Convert event stream to LLM message stream, handling multi-action batches + +#### abstractmethod to_llm_message() + +### class MessageEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + +Message from either agent or user. + +This is originally the “MessageAction”, but it suppose not to be tool call. + + +#### Properties + +- `activated_skills`: list[str] +- `critic_result`: CriticResult | None +- `extended_content`: list[TextContent] +- `llm_message`: Message +- `llm_response_id`: str | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `reasoning_content`: str +- `sender`: str | None +- `source`: Literal['agent', 'user', 'environment'] +- `thinking_blocks`: Sequence[ThinkingBlock | RedactedThinkingBlock] + Return the Anthropic thinking blocks from the LLM message. +- `visualize`: Text + Return Rich Text representation of this message event. + +#### Methods + +#### to_llm_message() + +### class ObservationBaseEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + +Base class for anything as a response to a tool call. + +Examples include tool execution, error, user reject. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `tool_call_id`: str +- `tool_name`: str +### class ObservationEvent + +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) + + +#### Properties + +- `action_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `observation`: Observation +- `visualize`: Text + Return Rich Text representation of this observation event. + +#### Methods + +#### to_llm_message() + +### class PauseEvent + +Bases: [`Event`](#class-event) + +Event indicating that the agent execution was paused by user request. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `visualize`: Text + Return Rich Text representation of this pause event. +### class SystemPromptEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + +System prompt added by the agent. + +The system prompt can optionally include dynamic context that varies between +conversations. When `dynamic_context` is provided, it is included as a +second content block in the same system message. Cache markers are NOT +applied here - they are applied by `LLM._apply_prompt_caching()` when +caching is enabled, ensuring provider-specific cache control is only added +when appropriate. + + +#### Properties + +- `dynamic_context`: TextContent | None +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `system_prompt`: TextContent +- `tools`: list[ToolDefinition] +- `visualize`: Text + Return Rich Text representation of this system prompt event. + +#### Methods + +#### system_prompt + +The static system prompt text (cacheable across conversations) + +* Type: + openhands.sdk.llm.message.TextContent + +#### tools + +List of available tools + +* Type: + list[openhands.sdk.tool.tool.ToolDefinition] + +#### dynamic_context + +Optional per-conversation context (hosts, repo info, etc.) +Sent as a second TextContent block inside the system message. + +* Type: + openhands.sdk.llm.message.TextContent | None + +#### to_llm_message() + +Convert to a single system LLM message. + +When `dynamic_context` is present the message contains two content +blocks: the static prompt followed by the dynamic context. Cache markers +are NOT applied here - they are applied by `LLM._apply_prompt_caching()` +when caching is enabled, which marks the static block (index 0) and leaves +the dynamic block (index 1) unmarked for cross-conversation cache sharing. + +### class TokenEvent + +Bases: [`Event`](#class-event) + +Event from VLLM representing token IDs used in LLM interaction. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `prompt_token_ids`: list[int] +- `response_token_ids`: list[int] +- `source`: Literal['agent', 'user', 'environment'] +### class UserRejectObservation + +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) + +Observation when an action is rejected by user or hook. + +This event is emitted when: +- User rejects an action during confirmation mode (rejection_source=”user”) +- A PreToolUse hook blocks an action (rejection_source=”hook”) + + +#### Properties + +- `action_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `rejection_reason`: str +- `rejection_source`: Literal['user', 'hook'] +- `visualize`: Text + Return Rich Text representation of this user rejection event. + +#### Methods + +#### to_llm_message() + +### openhands.sdk.llm +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.llm.md + +### class CredentialStore + +Bases: `object` + +Store and retrieve OAuth credentials for LLM providers. + + +#### Properties + +- `credentials_dir`: Path + Get the credentials directory, creating it if necessary. + +#### Methods + +#### __init__() + +Initialize the credential store. + +* Parameters: + `credentials_dir` – Optional custom directory for storing credentials. + Defaults to ~/.local/share/openhands/auth/ + +#### delete() + +Delete stored credentials for a vendor. + +* Parameters: + `vendor` – The vendor/provider name +* Returns: + True if credentials were deleted, False if they didn’t exist + +#### get() + +Get stored credentials for a vendor. + +* Parameters: + `vendor` – The vendor/provider name (e.g., ‘openai’) +* Returns: + OAuthCredentials if found and valid, None otherwise + +#### save() + +Save credentials for a vendor. + +* Parameters: + `credentials` – The OAuth credentials to save + +#### update_tokens() + +Update tokens for an existing credential. + +* Parameters: + * `vendor` – The vendor/provider name + * `access_token` – New access token + * `refresh_token` – New refresh token (if provided) + * `expires_in` – Token expiry in seconds +* Returns: + Updated credentials, or None if no existing credentials found + +### class ImageContent + +Bases: `BaseContent` + + +#### Properties + +- `image_urls`: list[str] +- `type`: Literal['image'] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### to_llm_dict() + +Convert to LLM API format. + +### class LLM + +Bases: `BaseModel`, `RetryMixin`, `NonNativeToolCallingMixin` + +Language model interface for OpenHands agents. + +The LLM class provides a unified interface for interacting with various +language models through the litellm library. It handles model configuration, +API authentication, +retry logic, and tool calling capabilities. + +#### Example + +```pycon +>>> from openhands.sdk import LLM +>>> from pydantic import SecretStr +>>> llm = LLM( +... model="claude-sonnet-4-20250514", +... api_key=SecretStr("your-api-key"), +... usage_id="my-agent" +... ) +>>> # Use with agent or conversation +``` + + +#### Properties + +- `api_key`: str | SecretStr | None +- `api_version`: str | None +- `aws_access_key_id`: str | SecretStr | None +- `aws_region_name`: str | None +- `aws_secret_access_key`: str | SecretStr | None +- `base_url`: str | None +- `caching_prompt`: bool +- `custom_tokenizer`: str | None +- `disable_stop_word`: bool | None +- `disable_vision`: bool | None +- `drop_params`: bool +- `enable_encrypted_reasoning`: bool +- `extended_thinking_budget`: int | None +- `extra_headers`: dict[str, str] | None +- `force_string_serializer`: bool | None +- `input_cost_per_token`: float | None +- `is_subscription`: bool + Check if this LLM uses subscription-based authentication. + Returns True when the LLM was created via LLM.subscription_login(), + which uses the ChatGPT subscription Codex backend rather than the + standard OpenAI API. + * Returns: + True if using subscription-based transport, False otherwise. + * Return type: + bool +- `litellm_extra_body`: dict[str, Any] +- `log_completions`: bool +- `log_completions_folder`: str +- `max_input_tokens`: int | None +- `max_message_chars`: int +- `max_output_tokens`: int | None +- `metrics`: [Metrics](#class-metrics) + Get usage metrics for this LLM instance. + * Returns: + Metrics object containing token usage, costs, and other statistics. +- `model`: str +- `model_canonical_name`: str | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `model_info`: dict | None + Returns the model info dictionary. +- `modify_params`: bool +- `native_tool_calling`: bool +- `num_retries`: int +- `ollama_base_url`: str | None +- `openrouter_app_name`: str +- `openrouter_site_url`: str +- `output_cost_per_token`: float | None +- `prompt_cache_retention`: str | None +- `reasoning_effort`: Literal['low', 'medium', 'high', 'xhigh', 'none'] | None +- `reasoning_summary`: Literal['auto', 'concise', 'detailed'] | None +- `retry_listener`: SkipJsonSchema[Callable[[int, int, BaseException | None], None] | None] +- `retry_max_wait`: int +- `retry_min_wait`: int +- `retry_multiplier`: float +- `safety_settings`: list[dict[str, str]] | None +- `seed`: int | None +- `stream`: bool +- `telemetry`: Telemetry + Get telemetry handler for this LLM instance. + * Returns: + Telemetry object for managing logging and metrics callbacks. +- `temperature`: float | None +- `timeout`: int | None +- `top_k`: float | None +- `top_p`: float | None +- `usage_id`: str + +#### Methods + +#### completion() + +Generate a completion from the language model. + +This is the method for getting responses from the model via Completion API. +It handles message formatting, tool calling, and response processing. + +* Parameters: + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `_return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming tokens + kwargs* – Additional arguments passed to the LLM API +* Returns: + LLMResponse containing the model’s response and metadata. + +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. + +* Raises: + `ValueError` – If streaming is requested (not supported). + +#### format_messages_for_llm() + +Formats Message objects for LLM consumption. + +#### format_messages_for_responses() + +Prepare (instructions, input[]) for the OpenAI Responses API. + +- Skips prompt caching flags and string serializer concerns +- Uses Message.to_responses_value to get either instructions (system) + or input items (others) +- Concatenates system instructions into a single instructions string +- For subscription mode, system prompts are prepended to user content + +#### get_token_count() + +#### is_caching_prompt_active() + +Check if prompt caching is supported and enabled for current model. + +* Returns: + True if prompt caching is supported and enabled for the given + : model. +* Return type: + boolean + +#### classmethod load_from_env() + +#### classmethod load_from_json() + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### reset_metrics() + +Reset metrics and telemetry to fresh instances. + +This is used by the LLMRegistry to ensure each registered LLM has +independent metrics, preventing metrics from being shared between +LLMs that were created via model_copy(). + +When an LLM is copied (e.g., to create a condenser LLM from an agent LLM), +Pydantic’s model_copy() does a shallow copy of private attributes by default, +causing the original and copied LLM to share the same Metrics object. +This method allows the registry to fix this by resetting metrics to None, +which will be lazily recreated when accessed. + +#### responses() + +Alternative invocation path using OpenAI Responses API via LiteLLM. + +Maps Message[] -> (instructions, input[]) and returns LLMResponse. + +* Parameters: + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `include` – Optional list of fields to include in response + * `store` – Whether to store the conversation + * `_return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming deltas + kwargs* – Additional arguments passed to the API + +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. + +#### restore_metrics() + +#### classmethod subscription_login() + +Authenticate with a subscription service and return an LLM instance. + +This method provides subscription-based access to LLM models that are +available through chat subscriptions (e.g., ChatGPT Plus/Pro) rather +than API credits. It handles credential caching, token refresh, and +the OAuth login flow. + +Currently supported vendors: +- “openai”: ChatGPT Plus/Pro subscription for Codex models + +Supported OpenAI models: +- gpt-5.1-codex-max +- gpt-5.1-codex-mini +- gpt-5.2 +- gpt-5.2-codex + +* Parameters: + * `vendor` – The vendor/provider. Currently only “openai” is supported. + * `model` – The model to use. Must be supported by the vendor’s + subscription service. + * `force_login` – If True, always perform a fresh login even if valid + credentials exist. + * `open_browser` – Whether to automatically open the browser for the + OAuth login flow. + llm_kwargs* – Additional arguments to pass to the LLM constructor. +* Returns: + An LLM instance configured for subscription-based access. +* Raises: + * `ValueError` – If the vendor or model is not supported. + * `RuntimeError` – If authentication fails. + +#### uses_responses_api() + +Whether this model uses the OpenAI Responses API path. + +#### vision_is_active() + +### class LLMProfileStore + +Bases: `object` + +Standalone utility for persisting LLM configurations. + +#### Methods + +#### __init__() + +Initialize the profile store. + +* Parameters: + `base_dir` – Path to the directory where the profiles are stored. + If None is provided, the default directory is used, i.e., + ~/.openhands/profiles. + +#### delete() + +Delete an existing profile. + +If the profile is not present in the profile directory, it does nothing. + +* Parameters: + `name` – Name of the profile to delete. +* Raises: + `TimeoutError` – If the lock cannot be acquired. + +#### list() + +Returns a list of all profiles stored. + +* Returns: + List of profile filenames (e.g., [“default.json”, “gpt4.json”]). + +#### load() + +Load an LLM instance from the given profile name. + +* Parameters: + `name` – Name of the profile to load. +* Returns: + An LLM instance constructed from the profile configuration. +* Raises: + * `FileNotFoundError` – If the profile name does not exist. + * `ValueError` – If the profile file is corrupted or invalid. + * `TimeoutError` – If the lock cannot be acquired. + +#### save() + +Save a profile to the profile directory. + +Note that if a profile name already exists, it will be overwritten. + +* Parameters: + * `name` – Name of the profile to save. + * `llm` – LLM instance to save + * `include_secrets` – Whether to include the profile secrets. Defaults to False. +* Raises: + `TimeoutError` – If the lock cannot be acquired. + +### class LLMRegistry + +Bases: `object` + +A minimal LLM registry for managing LLM instances by usage ID. + +This registry provides a simple way to manage multiple LLM instances, +avoiding the need to recreate LLMs with the same configuration. + +The registry also ensures that each registered LLM has independent metrics, +preventing metrics from being shared between LLMs that were created via +model_copy(). This is important for scenarios like creating a condenser LLM +from an agent LLM, where each should track its own usage independently. + + +#### Properties + +- `registry_id`: str +- `retry_listener`: Callable[[int, int], None] | None +- `subscriber`: Callable[[[RegistryEvent](#class-registryevent)], None] | None +- `usage_to_llm`: MappingProxyType + Access the internal usage-ID-to-LLM mapping (read-only view). + +#### Methods + +#### __init__() + +Initialize the LLM registry. + +* Parameters: + `retry_listener` – Optional callback for retry events. + +#### add() + +Add an LLM instance to the registry. + +This method ensures that the LLM has independent metrics before +registering it. If the LLM’s metrics are shared with another +registered LLM (e.g., due to model_copy()), fresh metrics will +be created automatically. + +* Parameters: + `llm` – The LLM instance to register. +* Raises: + `ValueError` – If llm.usage_id already exists in the registry. + +#### get() + +Get an LLM instance from the registry. + +* Parameters: + `usage_id` – Unique identifier for the LLM usage slot. +* Returns: + The LLM instance. +* Raises: + `KeyError` – If usage_id is not found in the registry. + +#### list_usage_ids() + +List all registered usage IDs. + +#### notify() + +Notify subscribers of registry events. + +* Parameters: + `event` – The registry event to notify about. + +#### subscribe() + +Subscribe to registry events. + +* Parameters: + `callback` – Function to call when LLMs are created or updated. + +### class LLMResponse + +Bases: `BaseModel` + +Result of an LLM completion request. + +This type provides a clean interface for LLM completion results, exposing +only OpenHands-native types to consumers while preserving access to the +raw LiteLLM response for internal use. + + +#### Properties + +- `id`: str + Get the response ID from the underlying LLM response. + This property provides a clean interface to access the response ID, + supporting both completion mode (ModelResponse) and response API modes + (ResponsesAPIResponse). + * Returns: + The response ID from the LLM response +- `message`: [Message](#class-message) +- `metrics`: [MetricsSnapshot](#class-metricssnapshot) +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `raw_response`: ModelResponse | ResponsesAPIResponse + +#### Methods + +#### message + +The completion message converted to OpenHands Message type + +* Type: + [openhands.sdk.llm.message.Message](#class-message) + +#### metrics + +Snapshot of metrics from the completion request + +* Type: + [openhands.sdk.llm.utils.metrics.MetricsSnapshot](#class-metricssnapshot) + +#### raw_response + +The original LiteLLM response (ModelResponse or +ResponsesAPIResponse) for internal use + +* Type: + litellm.types.utils.ModelResponse | litellm.types.llms.openai.ResponsesAPIResponse + +### class Message + +Bases: `BaseModel` + + +#### Properties + +- `contains_image`: bool +- `content`: Sequence[[TextContent](#class-textcontent) | [ImageContent](#class-imagecontent)] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: str | None +- `reasoning_content`: str | None +- `responses_reasoning_item`: [ReasoningItemModel](#class-reasoningitemmodel) | None +- `role`: Literal['user', 'system', 'assistant', 'tool'] +- `thinking_blocks`: Sequence[[ThinkingBlock](#class-thinkingblock) | [RedactedThinkingBlock](#class-redactedthinkingblock)] +- `tool_call_id`: str | None +- `tool_calls`: list[[MessageToolCall](#class-messagetoolcall)] | None + +#### Methods + +#### classmethod from_llm_chat_message() + +Convert a LiteLLMMessage (Chat Completions) to our Message class. + +Provider-agnostic mapping for reasoning: +- Prefer message.reasoning_content if present (LiteLLM normalized field) +- Extract thinking_blocks from content array (Anthropic-specific) + +#### classmethod from_llm_responses_output() + +Convert OpenAI Responses API output items into a single assistant Message. + +Policy (non-stream): +- Collect assistant text by concatenating output_text parts from message items +- Normalize function_call items to MessageToolCall list + +#### to_chat_dict() + +Serialize message for OpenAI Chat Completions. + +* Parameters: + * `cache_enabled` – Whether prompt caching is active. + * `vision_enabled` – Whether vision/image processing is enabled. + * `function_calling_enabled` – Whether native function calling is enabled. + * `force_string_serializer` – Force string serializer instead of list format. + * `send_reasoning_content` – Whether to include reasoning_content in output. + +Chooses the appropriate content serializer and then injects threading keys: +- Assistant tool call turn: role == “assistant” and self.tool_calls +- Tool result turn: role == “tool” and self.tool_call_id (with name) + +#### to_responses_dict() + +Serialize message for OpenAI Responses (input parameter). + +Produces a list of “input” items for the Responses API: +- system: returns [], system content is expected in ‘instructions’ +- user: one ‘message’ item with content parts -> input_text / input_image +(when vision enabled) +- assistant: emits prior assistant content as input_text, +and function_call items for tool_calls +- tool: emits function_call_output items (one per TextContent) +with matching call_id + +#### to_responses_value() + +Return serialized form. + +Either an instructions string (for system) or input items (for other roles). + +### class MessageToolCall + +Bases: `BaseModel` + +Transport-agnostic tool call representation. + +One canonical id is used for linking across actions/observations and +for Responses function_call_output call_id. + + +#### Properties + +- `arguments`: str +- `id`: str +- `name`: str +- `origin`: Literal['completion', 'responses'] +- `costs`: list[Cost] +- `response_latencies`: list[ResponseLatency] +- `token_usages`: list[TokenUsage] + +#### Methods + +#### classmethod from_chat_tool_call() + +Create a MessageToolCall from a Chat Completions tool call. + +#### classmethod from_responses_function_call() + +Create a MessageToolCall from a typed OpenAI Responses function_call item. + +Note: OpenAI Responses function_call.arguments is already a JSON string. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### to_chat_dict() + +Serialize to OpenAI Chat Completions tool_calls format. + +#### to_responses_dict() + +Serialize to OpenAI Responses ‘function_call’ input item format. + +#### add_cost() + +#### add_response_latency() + +#### add_token_usage() + +Add a single usage record. + +#### deep_copy() + +Create a deep copy of the Metrics object. + +#### diff() + +Calculate the difference between current metrics and a baseline. + +This is useful for tracking metrics for specific operations like delegates. + +* Parameters: + `baseline` – A metrics object representing the baseline state +* Returns: + A new Metrics object containing only the differences since the baseline + +#### get() + +Return the metrics in a dictionary. + +#### get_snapshot() + +Get a snapshot of the current metrics without the detailed lists. + +#### initialize_accumulated_token_usage() + +#### log() + +Log the metrics. + +#### merge() + +Merge ‘other’ metrics into this one. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### classmethod validate_accumulated_cost() + +### class MetricsSnapshot + +Bases: `BaseModel` + +A snapshot of metrics at a point in time. + +Does not include lists of individual costs, latencies, or token usages. + + +#### Properties + +- `accumulated_cost`: float +- `accumulated_token_usage`: TokenUsage | None +- `max_budget_per_task`: float | None +- `model_name`: str + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class OAuthCredentials + +Bases: `BaseModel` + +OAuth credentials for subscription-based LLM access. + + +#### Properties + +- `access_token`: str +- `expires_at`: int +- `refresh_token`: str +- `type`: Literal['oauth'] +- `vendor`: str + +#### Methods + +#### is_expired() + +Check if the access token is expired. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class OpenAISubscriptionAuth + +Bases: `object` + +Handle OAuth authentication for OpenAI ChatGPT subscription access. + + +#### Properties + +- `vendor`: str + Get the vendor name. + +#### Methods + +#### __init__() + +Initialize the OpenAI subscription auth handler. + +* Parameters: + * `credential_store` – Optional custom credential store. + * `oauth_port` – Port for the local OAuth callback server. + +#### create_llm() + +Create an LLM instance configured for Codex subscription access. + +* Parameters: + * `model` – The model to use (must be in OPENAI_CODEX_MODELS). + * `credentials` – OAuth credentials to use. If None, uses stored credentials. + * `instructions` – Optional instructions for the Codex model. + llm_kwargs* – Additional arguments to pass to LLM constructor. +* Returns: + An LLM instance configured for Codex access. +* Raises: + `ValueError` – If the model is not supported or no credentials available. + +#### get_credentials() + +Get stored credentials if they exist. + +#### has_valid_credentials() + +Check if valid (non-expired) credentials exist. + +#### async login() + +Perform OAuth login flow. + +This starts a local HTTP server to handle the OAuth callback, +opens the browser for user authentication, and waits for the +callback with the authorization code. + +* Parameters: + `open_browser` – Whether to automatically open the browser. +* Returns: + The obtained OAuth credentials. +* Raises: + `RuntimeError` – If the OAuth flow fails or times out. + +#### logout() + +Remove stored credentials. + +* Returns: + True if credentials were removed, False if none existed. + +#### async refresh_if_needed() + +Refresh credentials if they are expired. + +* Returns: + Updated credentials, or None if no credentials exist. +* Raises: + `RuntimeError` – If token refresh fails. + +### class ReasoningItemModel + +Bases: `BaseModel` + +OpenAI Responses reasoning item (non-stream, subset we consume). + +Do not log or render encrypted_content. + + +#### Properties + +- `content`: list[str] | None +- `encrypted_content`: str | None +- `id`: str | None +- `status`: str | None +- `summary`: list[str] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class RedactedThinkingBlock + +Bases: `BaseModel` + +Redacted thinking block for previous responses without extended thinking. + +This is used as a placeholder for assistant messages that were generated +before extended thinking was enabled. + + +#### Properties + +- `data`: str +- `type`: Literal['redacted_thinking'] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class RegistryEvent + +Bases: `BaseModel` + + +#### Properties + +- `llm`: [LLM](#class-llm) +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +### class RouterLLM + +Bases: [`LLM`](#class-llm) + +Base class for multiple LLM acting as a unified LLM. +This class provides a foundation for implementing model routing by +inheriting from LLM, allowing routers to work with multiple underlying +LLM models while presenting a unified LLM interface to consumers. +Key features: +- Works with multiple LLMs configured via llms_for_routing +- Delegates all other operations/properties to the selected LLM +- Provides routing interface through select_llm() method + + +#### Properties + +- `active_llm`: [LLM](#class-llm) | None +- `llms_for_routing`: dict[str, [LLM](#class-llm)] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `router_name`: str + +#### Methods + +#### completion() + +This method intercepts completion calls and routes them to the appropriate +underlying LLM based on the routing logic implemented in select_llm(). + +* Parameters: + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming tokens + kwargs* – Additional arguments passed to the LLM API + +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### abstractmethod select_llm() + +Select which LLM to use based on messages and events. + +This method implements the core routing logic for the RouterLLM. +Subclasses should analyze the provided messages to determine which +LLM from llms_for_routing is most appropriate for handling the request. + +* Parameters: + `messages` – List of messages in the conversation that can be used + to inform the routing decision. +* Returns: + The key/name of the LLM to use from llms_for_routing dictionary. + +#### classmethod set_placeholder_model() + +Guarantee model exists before LLM base validation runs. + +#### classmethod validate_llms_not_empty() + +### class TextContent + +Bases: `BaseContent` + + +#### Properties + +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `text`: str +- `type`: Literal['text'] + +#### Methods + +#### to_llm_dict() + +Convert to LLM API format. + +### class ThinkingBlock + +Bases: `BaseModel` + +Anthropic thinking block for extended thinking feature. + +This represents the raw thinking blocks returned by Anthropic models +when extended thinking is enabled. These blocks must be preserved +and passed back to the API for tool use scenarios. + + +#### Properties + +- `signature`: str | None +- `thinking`: str +- `type`: Literal['thinking'] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### openhands.sdk.security +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.security.md + +### class AlwaysConfirm + +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### should_confirm() + +Determine if an action with the given risk level requires confirmation. + +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. + +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. + +### class ConfirmRisky + +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) + + +#### Properties + +- `confirm_unknown`: bool +- `threshold`: [SecurityRisk](#class-securityrisk) + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### should_confirm() + +Determine if an action with the given risk level requires confirmation. + +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. + +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. + +#### classmethod validate_threshold() + +### class ConfirmationPolicyBase + +Bases: `DiscriminatedUnionMixin`, `ABC` + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### abstractmethod should_confirm() + +Determine if an action with the given risk level requires confirmation. + +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. + +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. + +### class GraySwanAnalyzer + +Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) + +Security analyzer using GraySwan’s Cygnal API for AI safety monitoring. + +This analyzer sends conversation history and pending actions to the GraySwan +Cygnal API for security analysis. The API returns a violation score which is +mapped to SecurityRisk levels. + +Environment Variables: +: GRAYSWAN_API_KEY: Required API key for GraySwan authentication + GRAYSWAN_POLICY_ID: Optional policy ID for custom GraySwan policy + +#### Example + +```pycon +>>> from openhands.sdk.security.grayswan import GraySwanAnalyzer +>>> analyzer = GraySwanAnalyzer() +>>> risk = analyzer.security_risk(action_event) +``` + + +#### Properties + +- `api_key`: SecretStr | None +- `api_url`: str +- `history_limit`: int +- `low_threshold`: float +- `max_message_chars`: int +- `medium_threshold`: float +- `policy_id`: str | None +- `timeout`: float + +#### Methods + +#### close() + +Clean up resources. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### model_post_init() + +Initialize the analyzer after model creation. + +#### security_risk() + +Analyze action for security risks using GraySwan API. + +This method converts the conversation history and the pending action +to OpenAI message format and sends them to the GraySwan Cygnal API +for security analysis. + +* Parameters: + `action` – The ActionEvent to analyze +* Returns: + SecurityRisk level based on GraySwan analysis + +#### set_events() + +Set the events for context when analyzing actions. + +* Parameters: + `events` – Sequence of events to use as context for security analysis + +#### validate_thresholds() + +Validate that thresholds are properly ordered. + +### class LLMSecurityAnalyzer + +Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) + +LLM-based security analyzer. + +This analyzer respects the security_risk attribute that can be set by the LLM +when generating actions, similar to OpenHands’ LLMRiskAnalyzer. + +It provides a lightweight security analysis approach that leverages the LLM’s +understanding of action context and potential risks. + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### security_risk() + +Evaluate security risk based on LLM-provided assessment. + +This method checks if the action has a security_risk attribute set by the LLM +and returns it. The LLM may not always provide this attribute but it defaults to +UNKNOWN if not explicitly set. + +### class NeverConfirm + +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### should_confirm() + +Determine if an action with the given risk level requires confirmation. + +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. + +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. + +### class SecurityAnalyzerBase + +Bases: `DiscriminatedUnionMixin`, `ABC` + +Abstract base class for security analyzers. + +Security analyzers evaluate the risk of actions before they are executed +and can influence the conversation flow based on security policies. + +This is adapted from OpenHands SecurityAnalyzer but designed to work +with the agent-sdk’s conversation-based architecture. + +#### Methods + +#### analyze_event() + +Analyze an event for security risks. + +This is a convenience method that checks if the event is an action +and calls security_risk() if it is. Non-action events return None. + +* Parameters: + `event` – The event to analyze +* Returns: + ActionSecurityRisk if event is an action, None otherwise + +#### analyze_pending_actions() + +Analyze all pending actions in a conversation. + +This method gets all unmatched actions from the conversation state +and analyzes each one for security risks. + +* Parameters: + `conversation` – The conversation to analyze +* Returns: + List of tuples containing (action, risk_level) for each pending action + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### abstractmethod security_risk() + +Evaluate the security risk of an ActionEvent. + +This is the core method that analyzes an ActionEvent and returns its risk level. +Implementations should examine the action’s content, context, and potential +impact to determine the appropriate risk level. + +* Parameters: + `action` – The ActionEvent to analyze for security risks +* Returns: + ActionSecurityRisk enum indicating the risk level + +#### should_require_confirmation() + +Determine if an action should require user confirmation. + +This implements the default confirmation logic based on risk level +and confirmation mode settings. + +* Parameters: + * `risk` – The security risk level of the action + * `confirmation_mode` – Whether confirmation mode is enabled +* Returns: + True if confirmation is required, False otherwise + +### class SecurityRisk + +Bases: `str`, `Enum` + +Security risk levels for actions. + +Based on OpenHands security risk levels but adapted for agent-sdk. +Integer values allow for easy comparison and ordering. + + +#### Properties + +- `description`: str + Get a human-readable description of the risk level. +- `visualize`: Text + Return Rich Text representation of this risk level. + +#### Methods + +#### HIGH = 'HIGH' + +#### LOW = 'LOW' + +#### MEDIUM = 'MEDIUM' + +#### UNKNOWN = 'UNKNOWN' + +#### get_color() + +Get the color for displaying this risk level in Rich text. + +#### is_riskier() + +Check if this risk level is riskier than another. + +Risk levels follow the natural ordering: LOW is less risky than MEDIUM, which is +less risky than HIGH. UNKNOWN is not comparable to any other level. + +To make this act like a standard well-ordered domain, we reflexively consider +risk levels to be riskier than themselves. That is: + + for risk_level in list(SecurityRisk): + : assert risk_level.is_riskier(risk_level) + + # More concretely: + assert SecurityRisk.HIGH.is_riskier(SecurityRisk.HIGH) + assert SecurityRisk.MEDIUM.is_riskier(SecurityRisk.MEDIUM) + assert SecurityRisk.LOW.is_riskier(SecurityRisk.LOW) + +This can be disabled by setting the reflexive parameter to False. + +* Parameters: + other ([SecurityRisk*](#class-securityrisk)) – The other risk level to compare against. + reflexive (bool*) – Whether the relationship is reflexive. +* Raises: + `ValueError` – If either risk level is UNKNOWN. + +### openhands.sdk.tool +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.tool.md + +### class Action + +Bases: `Schema`, `ABC` + +Base schema for input action. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `visualize`: Text + Return Rich Text representation of this action. + This method can be overridden by subclasses to customize visualization. + The base implementation displays all action fields systematically. +### class ExecutableTool + +Bases: `Protocol` + +Protocol for tools that are guaranteed to have a non-None executor. + +This eliminates the need for runtime None checks and type narrowing +when working with tools that are known to be executable. + + +#### Properties + +- `executor`: [ToolExecutor](#class-toolexecutor)[Any, Any] +- `name`: str + +#### Methods + +#### __init__() + +### class FinishTool + +Bases: `ToolDefinition[FinishAction, FinishObservation]` + +Tool for signaling the completion of a task or conversation. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### Methods + +#### classmethod create() + +Create FinishTool instance. + +* Parameters: + * `conv_state` – Optional conversation state (not used by FinishTool). + params* – Additional parameters (none supported). +* Returns: + A sequence containing a single FinishTool instance. +* Raises: + `ValueError` – If any parameters are provided. + +#### name = 'finish' + +### class Observation + +Bases: `Schema`, `ABC` + +Base schema for output observation. + + +#### Properties + +- `ERROR_MESSAGE_HEADER`: ClassVar[str] = '[An error occurred during execution.]n' +- `content`: list[TextContent | ImageContent] +- `is_error`: bool +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `text`: str + Extract all text content from the observation. + * Returns: + Concatenated text from all TextContent items in content. +- `to_llm_content`: Sequence[TextContent | ImageContent] + Default content formatting for converting observation to LLM readable content. + Subclasses can override to provide richer content (e.g., images, diffs). +- `visualize`: Text + Return Rich Text representation of this observation. + Subclasses can override for custom visualization; by default we show the + same text that would be sent to the LLM. + +#### Methods + +#### classmethod from_text() + +Utility to create an Observation from a simple text string. + +* Parameters: + * `text` – The text content to include in the observation. + * `is_error` – Whether this observation represents an error. + kwargs* – Additional fields for the observation subclass. +* Returns: + An Observation instance with the text wrapped in a TextContent. + +### class ThinkTool + +Bases: `ToolDefinition[ThinkAction, ThinkObservation]` + +Tool for logging thoughts without making changes. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### Methods + +#### classmethod create() + +Create ThinkTool instance. + +* Parameters: + * `conv_state` – Optional conversation state (not used by ThinkTool). + params* – Additional parameters (none supported). +* Returns: + A sequence containing a single ThinkTool instance. +* Raises: + `ValueError` – If any parameters are provided. + +#### name = 'think' + +### class Tool + +Bases: `BaseModel` + +Defines a tool to be initialized for the agent. + +This is only used in agent-sdk for type schema for server use. + + +#### Properties + +- `name`: str +- `params`: dict[str, Any] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### classmethod validate_name() + +Validate that name is not empty. + +#### classmethod validate_params() + +Convert None params to empty dict. + +### class ToolAnnotations + +Bases: `BaseModel` + +Annotations to provide hints about the tool’s behavior. + +Based on Model Context Protocol (MCP) spec: +[https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838) + + +#### Properties + +- `destructiveHint`: bool +- `idempotentHint`: bool +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `openWorldHint`: bool +- `readOnlyHint`: bool +- `title`: str | None +### class ToolDefinition + +Bases: `DiscriminatedUnionMixin`, `ABC`, `Generic` + +Base class for all tool implementations. + +This class serves as a base for the discriminated union of all tool types. +All tools must inherit from this class and implement the .create() method for +proper initialization with executors and parameters. + +Features: +- Normalize input/output schemas (class or dict) into both model+schema. +- Validate inputs before execute. +- Coerce outputs only if an output model is defined; else return vanilla JSON. +- Export MCP tool description. + +#### Examples + +Simple tool with no parameters: +: class FinishTool(ToolDefinition[FinishAction, FinishObservation]): + : @classmethod + def create(cls, conv_state=None, + `
` + ``` + ** + ``` + `
` + params): + `
` + > return [cls(name=”finish”, …, executor=FinishExecutor())] + +Complex tool with initialization parameters: +: class TerminalTool(ToolDefinition[TerminalAction, + : TerminalObservation]): + @classmethod + def create(cls, conv_state, + `
` + ``` + ** + ``` + `
` + params): + `
` + > executor = TerminalExecutor( + > : working_dir=conv_state.workspace.working_dir, + > `
` + > ``` + > ** + > ``` + > `
` + > params, + `
` + > ) + > return [cls(name=”terminal”, …, executor=executor)] + + +#### Properties + +- `action_type`: type[[Action](#class-action)] +- `annotations`: [ToolAnnotations](#class-toolannotations) | None +- `description`: str +- `executor`: Annotated[[ToolExecutor](#class-toolexecutor) | None, SkipJsonSchema()] +- `meta`: dict[str, Any] | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: ClassVar[str] = '' +- `observation_type`: type[[Observation](#class-observation)] | None +- `title`: str + +#### Methods + +#### action_from_arguments() + +Create an action from parsed arguments. + +This method can be overridden by subclasses to provide custom logic +for creating actions from arguments (e.g., for MCP tools). + +* Parameters: + `arguments` – The parsed arguments from the tool call. +* Returns: + The action instance created from the arguments. + +#### as_executable() + +Return this tool as an ExecutableTool, ensuring it has an executor. + +This method eliminates the need for runtime None checks by guaranteeing +that the returned tool has a non-None executor. + +* Returns: + This tool instance, typed as ExecutableTool. +* Raises: + `NotImplementedError` – If the tool has no executor. + +#### abstractmethod classmethod create() + +Create a sequence of Tool instances. + +This method must be implemented by all subclasses to provide custom +initialization logic, typically initializing the executor with parameters +from conv_state and other optional parameters. + +* Parameters: + args** – Variable positional arguments (typically conv_state as first arg). + kwargs* – Optional parameters for tool initialization. +* Returns: + A sequence of Tool instances. Even single tools are returned as a sequence + to provide a consistent interface and eliminate union return types. + +#### classmethod resolve_kind() + +Resolve a kind string to its corresponding tool class. + +* Parameters: + `kind` – The name of the tool class to resolve +* Returns: + The tool class corresponding to the kind +* Raises: + `ValueError` – If the kind is unknown + +#### set_executor() + +Create a new Tool instance with the given executor. + +#### to_mcp_tool() + +Convert a Tool to an MCP tool definition. + +Allow overriding input/output schemas (usually by subclasses). + +* Parameters: + * `input_schema` – Optionally override the input schema. + * `output_schema` – Optionally override the output schema. + +#### to_openai_tool() + +Convert a Tool to an OpenAI tool. + +* Parameters: + * `add_security_risk_prediction` – Whether to add a security_risk field + to the action schema for LLM to predict. This is useful for + tools that may have safety risks, so the LLM can reason about + the risk level before calling the tool. + * `action_type` – Optionally override the action_type to use for the schema. + This is useful for MCPTool to use a dynamically created action type + based on the tool’s input schema. + +#### NOTE +Summary field is always added to the schema for transparency and +explainability of agent actions. + +#### to_responses_tool() + +Convert a Tool to a Responses API function tool (LiteLLM typed). + +For Responses API, function tools expect top-level keys: +(JSON configuration object) + +* Parameters: + * `add_security_risk_prediction` – Whether to add a security_risk field + * `action_type` – Optional override for the action type + +#### NOTE +Summary field is always added to the schema for transparency and +explainability of agent actions. + +### class ToolExecutor + +Bases: `ABC`, `Generic` + +Executor function type for a Tool. + +#### Methods + +#### close() + +Close the executor and clean up resources. + +Default implementation does nothing. Subclasses should override +this method to perform cleanup (e.g., closing connections, +terminating processes, etc.). + +### openhands.sdk.utils +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.utils.md + +Utility functions for the OpenHands SDK. + +### deprecated() + +Return a decorator that deprecates a callable with explicit metadata. + +Use this helper when you can annotate a function, method, or property with +@deprecated(…). It transparently forwards to `deprecation.deprecated()` +while filling in the SDK’s current version metadata unless custom values are +supplied. + +### maybe_truncate() + +Truncate the middle of content if it exceeds the specified length. + +Keeps the head and tail of the content to preserve context at both ends. +Optionally saves the full content to a file for later investigation. + +* Parameters: + * `content` – The text content to potentially truncate + * `truncate_after` – Maximum length before truncation. If None, no truncation occurs + * `truncate_notice` – Notice to insert in the middle when content is truncated + * `save_dir` – Working directory to save full content file in + * `tool_prefix` – Prefix for the saved file (e.g., “bash”, “browser”, “editor”) +* Returns: + Original content if under limit, or truncated content with head and tail + preserved and reference to saved file if applicable + +### sanitize_openhands_mentions() + +Sanitize @OpenHands mentions in text to prevent self-mention loops. + +This function inserts a zero-width joiner (ZWJ) after the @ symbol in +@OpenHands mentions, making them non-clickable in GitHub comments while +preserving readability. The original case of the mention is preserved. + +* Parameters: + `text` – The text to sanitize +* Returns: + Text with sanitized @OpenHands mentions (e.g., “@OpenHands” -> “@‍OpenHands”) + +### Examples + +```pycon +>>> sanitize_openhands_mentions("Thanks @OpenHands for the help!") +'Thanks @u200dOpenHands for the help!' +>>> sanitize_openhands_mentions("Check @openhands and @OPENHANDS") +'Check @u200dopenhands and @u200dOPENHANDS' +>>> sanitize_openhands_mentions("No mention here") +'No mention here' +``` + +### sanitized_env() + +Return a copy of env with sanitized values. + +PyInstaller-based binaries rewrite `LD_LIBRARY_PATH` so their vendored +libraries win. This function restores the original value so that subprocess +will not use them. + +### warn_deprecated() + +Emit a deprecation warning for dynamic access to a legacy feature. + +Prefer this helper when a decorator is not practical—e.g. attribute accessors, +data migrations, or other runtime paths that must conditionally warn. Provide +explicit version metadata so the SDK reports consistent messages and upgrades +to `deprecation.UnsupportedWarning` after the removal threshold. + +### openhands.sdk.workspace +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.workspace.md + +### class BaseWorkspace + +Bases: `DiscriminatedUnionMixin`, `ABC` + +Abstract base class for workspace implementations. + +Workspaces provide a sandboxed environment where agents can execute commands, +read/write files, and perform other operations. All workspace implementations +support the context manager protocol for safe resource management. + +#### Example + +```pycon +>>> with workspace: +... result = workspace.execute_command("echo 'hello'") +... content = workspace.read_file("example.txt") +``` + + +#### Properties + +- `working_dir`: Annotated[str, BeforeValidator(func=_convert_path_to_str, json_schema_input_type=PydanticUndefined), FieldInfo(annotation=NoneType, required=True, description='The working directory for agent operations and tool execution. Accepts both string paths and Path objects. Path objects are automatically converted to strings.')] + +#### Methods + +#### abstractmethod execute_command() + +Execute a bash command on the system. + +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory for the command (optional) + * `timeout` – Timeout in seconds (defaults to 30.0) +* Returns: + Result containing stdout, stderr, exit_code, and other + : metadata +* Return type: + [CommandResult](#class-commandresult) +* Raises: + `Exception` – If command execution fails + +#### abstractmethod file_download() + +Download a file from the system. + +* Parameters: + * `source_path` – Path to the source file on the system + * `destination_path` – Path where the file should be downloaded +* Returns: + Result containing success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) +* Raises: + `Exception` – If file download fails + +#### abstractmethod file_upload() + +Upload a file to the system. + +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be uploaded +* Returns: + Result containing success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) +* Raises: + `Exception` – If file upload fails + +#### abstractmethod git_changes() + +Get the git changes for the repository at the path given. + +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed + +#### abstractmethod git_diff() + +Get the git diff for the file at the path given. + +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### pause() + +Pause the workspace to conserve resources. + +For local workspaces, this is a no-op. +For container-based workspaces, this pauses the container. + +* Raises: + `NotImplementedError` – If the workspace type does not support pausing. + +#### resume() + +Resume a paused workspace. + +For local workspaces, this is a no-op. +For container-based workspaces, this resumes the container. + +* Raises: + `NotImplementedError` – If the workspace type does not support resuming. + +### class CommandResult + +Bases: `BaseModel` + +Result of executing a command in the workspace. + + +#### Properties + +- `command`: str +- `exit_code`: int +- `stderr`: str +- `stdout`: str +- `timeout_occurred`: bool + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class FileOperationResult + +Bases: `BaseModel` + +Result of a file upload or download operation. + + +#### Properties + +- `destination_path`: str +- `error`: str | None +- `file_size`: int | None +- `source_path`: str +- `success`: bool + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class LocalWorkspace + +Bases: [`BaseWorkspace`](#class-baseworkspace) + +Local workspace implementation that operates on the host filesystem. + +LocalWorkspace provides direct access to the local filesystem and command execution +environment. It’s suitable for development and testing scenarios where the agent +should operate directly on the host system. + +#### Example + +```pycon +>>> workspace = LocalWorkspace(working_dir="/path/to/project") +>>> with workspace: +... result = workspace.execute_command("ls -la") +... content = workspace.read_file("README.md") +``` + +#### Methods + +#### __init__() + +Create a new model by parsing and validating input data from keyword arguments. + +Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be +validated to form a valid model. + +self is explicitly positional-only to allow self as a field name. + +#### execute_command() + +Execute a bash command locally. + +Uses the shared shell execution utility to run commands with proper +timeout handling, output streaming, and error management. + +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory (optional) + * `timeout` – Timeout in seconds +* Returns: + Result with stdout, stderr, exit_code, command, and + : timeout_occurred +* Return type: + [CommandResult](#class-commandresult) + +#### file_download() + +Download (copy) a file locally. + +For local systems, file download is implemented as a file copy operation +using shutil.copy2 to preserve metadata. + +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be copied +* Returns: + Result with success status and file information +* Return type: + [FileOperationResult](#class-fileoperationresult) + +#### file_upload() + +Upload (copy) a file locally. + +For local systems, file upload is implemented as a file copy operation +using shutil.copy2 to preserve metadata. + +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be copied +* Returns: + Result with success status and file information +* Return type: + [FileOperationResult](#class-fileoperationresult) + +#### git_changes() + +Get the git changes for the repository at the path given. + +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed + +#### git_diff() + +Get the git diff for the file at the path given. + +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### pause() + +Pause the workspace (no-op for local workspaces). + +Local workspaces have nothing to pause since they operate directly +on the host filesystem. + +#### resume() + +Resume the workspace (no-op for local workspaces). + +Local workspaces have nothing to resume since they operate directly +on the host filesystem. + +### class RemoteWorkspace + +Bases: `RemoteWorkspaceMixin`, [`BaseWorkspace`](#class-baseworkspace) + +Remote workspace implementation that connects to an OpenHands agent server. + +RemoteWorkspace provides access to a sandboxed environment running on a remote +OpenHands agent server. This is the recommended approach for production deployments +as it provides better isolation and security. + +#### Example + +```pycon +>>> workspace = RemoteWorkspace( +... host="https://agent-server.example.com", +... working_dir="/workspace" +... ) +>>> with workspace: +... result = workspace.execute_command("ls -la") +... content = workspace.read_file("README.md") +``` + + +#### Properties + +- `alive`: bool + Check if the remote workspace is alive by querying the health endpoint. + * Returns: + True if the health endpoint returns a successful response, False otherwise. +- `client`: Client + +#### Methods + +#### execute_command() + +Execute a bash command on the remote system. + +This method starts a bash command via the remote agent server API, +then polls for the output until the command completes. + +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory (optional) + * `timeout` – Timeout in seconds +* Returns: + Result with stdout, stderr, exit_code, and other metadata +* Return type: + [CommandResult](#class-commandresult) + +#### file_download() + +Download a file from the remote system. + +Requests the file from the remote system via HTTP API and saves it locally. + +* Parameters: + * `source_path` – Path to the source file on remote system + * `destination_path` – Path where the file should be saved locally +* Returns: + Result with success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) + +#### file_upload() + +Upload a file to the remote system. + +Reads the local file and sends it to the remote system via HTTP API. + +* Parameters: + * `source_path` – Path to the local source file + * `destination_path` – Path where the file should be uploaded on remote system +* Returns: + Result with success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) + +#### git_changes() + +Get the git changes for the repository at the path given. + +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed + +#### git_diff() + +Get the git diff for the file at the path given. + +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### model_post_init() + +Override this method to perform additional initialization after __init__ and model_construct. +This is useful if you want to do some validation that requires the entire model to be initialized. + +#### reset_client() + +Reset the HTTP client to force re-initialization. + +This is useful when connection parameters (host, api_key) have changed +and the client needs to be recreated with new values. + +### class Workspace + +### class Workspace + +Bases: `object` + +Factory entrypoint that returns a LocalWorkspace or RemoteWorkspace. + +Usage: +: - Workspace(working_dir=…) -> LocalWorkspace + - Workspace(working_dir=…, host=”http://…”) -> RemoteWorkspace + +### Agent +Source: https://docs.openhands.dev/sdk/arch/agent.md + +The **Agent** component implements the core reasoning-action loop that drives autonomous task execution. It orchestrates LLM queries, tool execution, and context management through a stateless, event-driven architecture. + +**Source:** [`openhands-sdk/openhands/sdk/agent/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/agent) + +## Core Responsibilities + +The Agent system has four primary responsibilities: + +1. **Reasoning-Action Loop** - Query LLM to generate next actions based on conversation history +2. **Tool Orchestration** - Select and execute tools, handle results and errors +3. **Context Management** - Apply [skills](/sdk/guides/skill), manage conversation history via [condensers](/sdk/guides/context-condenser) +4. **Security Validation** - Analyze proposed actions for safety before execution via [security analyzer](/sdk/guides/security) + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 50}} }%% +flowchart TB + subgraph Input[" "] + Events["Event History"] + Context["Agent Context
Skills + Prompts"] + end + + subgraph Core["Agent Core"] + Condense["Condenser
History compression"] + Reason["LLM Query
Generate actions"] + Security["Security Analyzer
Risk assessment"] + end + + subgraph Execution[" "] + Tools["Tool Executor
Action → Observation"] + Results["Observation Events"] + end + + Events --> Condense + Context -.->|Skills| Reason + Condense --> Reason + Reason --> Security + Security --> Tools + Tools --> Results + Results -.->|Feedback| Events + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Reason primary + class Condense,Security secondary + class Tools tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Agent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/agent.py)** | Main implementation | Stateless reasoning-action loop executor | +| **[`AgentBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/base.py)** | Abstract base class | Defines agent interface and initialization | +| **[`AgentContext`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/agent_context.py)** | Context container | Manages skills, prompts, and metadata | +| **[`Condenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/)** | History compression | Reduces context when token limits approached | +| **[`SecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/)** | Safety validation | Evaluates action risk before execution | + +## Reasoning-Action Loop + +The agent operates through a **single-step execution model** where each `step()` call processes one reasoning cycle: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 10, "rankSpacing": 10}} }%% +flowchart TB + Start["step() called"] + Pending{"Pending
actions?"} + ExecutePending["Execute pending actions"] + + HasCondenser{"Has
condenser?"} + Condense["Call condenser.condense()"] + CondenseResult{"Result
type?"} + EmitCondensation["Emit Condensation event"] + UseView["Use View events"] + UseRaw["Use raw events"] + + Query["Query LLM with messages"] + ContextExceeded{"Context
window
exceeded?"} + EmitRequest["Emit CondensationRequest"] + + Parse{"Response
type?"} + CreateActions["Create ActionEvents"] + CreateMessage["Create MessageEvent"] + + Confirmation{"Need
confirmation?"} + SetWaiting["Set WAITING_FOR_CONFIRMATION"] + + Execute["Execute actions"] + Observe["Create ObservationEvents"] + + Return["Return"] + + Start --> Pending + Pending -->|Yes| ExecutePending --> Return + Pending -->|No| HasCondenser + + HasCondenser -->|Yes| Condense + HasCondenser -->|No| UseRaw + Condense --> CondenseResult + CondenseResult -->|Condensation| EmitCondensation --> Return + CondenseResult -->|View| UseView --> Query + UseRaw --> Query + + Query --> ContextExceeded + ContextExceeded -->|Yes| EmitRequest --> Return + ContextExceeded -->|No| Parse + + Parse -->|Tool calls| CreateActions + Parse -->|Message| CreateMessage --> Return + + CreateActions --> Confirmation + Confirmation -->|Yes| SetWaiting --> Return + Confirmation -->|No| Execute + + Execute --> Observe + Observe --> Return + + style Query fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Condense fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Confirmation fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Step Execution Flow:** + +1. **Pending Actions:** If actions awaiting confirmation exist, execute them and return +2. **Condensation:** If condenser exists: + - Call `condenser.condense()` with current event view + - If returns `View`: use condensed events for LLM query (continue in same step) + - If returns `Condensation`: emit event and return (will be processed next step) +3. **LLM Query:** Query LLM with messages from event history + - If context window exceeded: emit `CondensationRequest` and return +4. **Response Parsing:** Parse LLM response into events + - Tool calls → create `ActionEvent`(s) + - Text message → create `MessageEvent` and return +5. **Confirmation Check:** If actions need user approval: + - Set conversation status to `WAITING_FOR_CONFIRMATION` and return +6. **Action Execution:** Execute tools and create `ObservationEvent`(s) + +**Key Characteristics:** +- **Stateless:** Agent holds no mutable state between steps +- **Event-Driven:** Reads from event history, writes new events +- **Interruptible:** Each step is atomic and can be paused/resumed + +## Agent Context + +The agent applies `AgentContext` which includes **skills** and **prompts** to shape LLM behavior: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Context["AgentContext"] + + subgraph Skills["Skills"] + Repo["repo
Always active"] + Knowledge["knowledge
Trigger-based"] + end + SystemAug["System prompt prefix/suffix
Per-conversation"] + System["Prompt template
Per-conversation"] + + subgraph Application["Applied to LLM"] + SysPrompt["System Prompt"] + UserMsg["User Messages"] + end + + Context --> Skills + Context --> SystemAug + Repo --> SysPrompt + Knowledge -.->|When triggered| UserMsg + System --> SysPrompt + SystemAug --> SysPrompt + + style Context fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Repo fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Knowledge fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +| Skill Type | Activation | Use Case | +|------------|------------|----------| +| **repo** | Always included | Project-specific context, conventions | +| **knowledge** | Trigger words/patterns | Domain knowledge, special behaviors | + +Review [this guide](/sdk/guides/skill) for details on creating and applying agent context and skills. + + +## Tool Execution + +Tools follow a **strict action-observation pattern**: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + LLM["LLM generates tool_call"] + Convert["Convert to ActionEvent"] + + Decision{"Confirmation
mode?"} + Defer["Store as pending"] + + Execute["Execute tool"] + Success{"Success?"} + + Obs["ObservationEvent
with result"] + Error["ObservationEvent
with error"] + + LLM --> Convert + Convert --> Decision + + Decision -->|Yes| Defer + Decision -->|No| Execute + + Execute --> Success + Success -->|Yes| Obs + Success -->|No| Error + + style Convert fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Execution Modes:** + +| Mode | Behavior | Use Case | +|------|----------|----------| +| **Direct** | Execute immediately | Development, trusted environments | +| **Confirmation** | Store as pending, wait for user approval | High-risk actions, production | + +**Security Integration:** + +Before execution, the security analyzer evaluates each action: +- **Low Risk:** Execute immediately +- **Medium Risk:** Log warning, execute with monitoring +- **High Risk:** Block execution, request user confirmation + +## Component Relationships + +### How Agent Interacts + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Agent["Agent"] + Conv["Conversation"] + LLM["LLM"] + Tools["Tools"] + Context["AgentContext"] + + Conv -->|.step calls| Agent + Agent -->|Reads events| Conv + Agent -->|Query| LLM + Agent -->|Execute| Tools + Context -.->|Skills and Context| Agent + Agent -.->|New events| Conv + + style Agent fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Conv fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style LLM fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Conversation → Agent**: Orchestrates step execution, provides event history +- **Agent → LLM**: Queries for next actions, receives tool calls or messages +- **Agent → Tools**: Executes actions, receives observations +- **AgentContext → Agent**: Injects skills and prompts into LLM queries + + +## See Also + +- **[Conversation Architecture](/sdk/arch/conversation)** - Agent orchestration and lifecycle +- **[Tool System](/sdk/arch/tool-system)** - Tool definition and execution patterns +- **[Events](/sdk/arch/events)** - Event types and structures +- **[Skills](/sdk/arch/skill)** - Prompt engineering and skill patterns +- **[LLM](/sdk/arch/llm)** - Language model abstraction + +### Agent Server Package +Source: https://docs.openhands.dev/sdk/arch/agent-server.md + +The Agent Server package (`openhands.agent_server`) provides an HTTP API server for remote agent execution. It enables building multi-user systems, SaaS products, and distributed agent platforms. + +**Source**: [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) + +## Purpose + +The Agent Server enables: +- **Remote execution**: Clients interact with agents via HTTP API +- **Multi-user isolation**: Each user gets isolated workspace +- **Container orchestration**: Manages Docker containers for workspaces +- **Centralized management**: Monitor and control all agents +- **Scalability**: Horizontal scaling with multiple servers + +## Architecture Overview + +```mermaid +graph TB + Client[Web/Mobile Client] -->|HTTPS| API[FastAPI Server] + + API --> Auth[Authentication] + API --> Router[API Router] + + Router --> WS[Workspace Manager] + Router --> Conv[Conversation Handler] + + WS --> Docker[Docker Manager] + Docker --> C1[Container 1
User A] + Docker --> C2[Container 2
User B] + Docker --> C3[Container 3
User C] + + Conv --> Agent[Software Agent SDK] + Agent --> C1 + Agent --> C2 + Agent --> C3 + + style Client fill:#e1f5fe + style API fill:#fff3e0 + style WS fill:#e8f5e8 + style Docker fill:#f3e5f5 + style Agent fill:#fce4ec +``` + +### Key Components + +**1. FastAPI Server** +- HTTP REST API endpoints +- Authentication and authorization +- Request validation +- WebSocket support for streaming + +**2. Workspace Manager** +- Creates and manages Docker containers +- Isolates workspaces per user +- Handles container lifecycle +- Manages resource limits + +**3. Conversation Handler** +- Routes requests to appropriate workspace +- Manages conversation state +- Handles concurrent requests +- Supports streaming responses + +**4. Docker Manager** +- Interfaces with Docker daemon +- Builds and pulls images +- Creates and destroys containers +- Monitors container health + +## Design Decisions + +### Why HTTP API? + +Alternative approaches considered: +- **gRPC**: More efficient but harder for web clients +- **WebSockets only**: Good for streaming but not RESTful +- **HTTP + WebSockets**: Best of both worlds + +**Decision**: HTTP REST for operations, WebSockets for streaming +- ✅ Works from any client (web, mobile, CLI) +- ✅ Easy to debug (curl, Postman) +- ✅ Standard authentication (API keys, OAuth) +- ✅ Streaming where needed + +### Why Container Per User? + +Alternative approaches: +- **Shared container**: Multiple users in one container +- **Container per session**: New container each conversation +- **Container per user**: One container per user (chosen) + +**Decision**: Container per user +- ✅ Strong isolation between users +- ✅ Persistent workspace across sessions +- ✅ Better resource management +- ⚠️ More containers, but worth it for isolation + +### Why FastAPI? + +Alternative frameworks: +- **Flask**: Simpler but less type-safe +- **Django**: Too heavyweight +- **FastAPI**: Modern, fast, type-safe (chosen) + +**Decision**: FastAPI +- ✅ Automatic API documentation (OpenAPI) +- ✅ Type validation with Pydantic +- ✅ Async support for performance +- ✅ WebSocket support built-in + +## API Design + +### Key Endpoints + +**Workspace Management** +``` +POST /workspaces Create new workspace +GET /workspaces/{id} Get workspace info +DELETE /workspaces/{id} Delete workspace +POST /workspaces/{id}/execute Execute command +``` + +**Conversation Management** +``` +POST /conversations Create conversation +GET /conversations/{id} Get conversation +POST /conversations/{id}/messages Send message +GET /conversations/{id}/stream Stream responses (WebSocket) +``` + +**Health & Monitoring** +``` +GET /health Server health check +GET /metrics Prometheus metrics +``` + +### Authentication + +**API Key Authentication** +```bash +curl -H "Authorization: Bearer YOUR_API_KEY" \ + https://agent-server.example.com/conversations +``` + +**Per-user workspace isolation** +- API key → user ID mapping +- Each user gets separate workspace +- Users can't access each other's workspaces + +### Streaming Responses + +**WebSocket for real-time updates** +```python +async with websocket_connect(url) as ws: + # Send message + await ws.send_json({"message": "Hello"}) + + # Receive events + async for event in ws: + if event["type"] == "message": + print(event["content"]) +``` + +**Why streaming?** +- Real-time feedback to users +- Show agent thinking process +- Better UX for long-running tasks + +## Deployment Models + +### 1. Local Development + +Run server locally for testing: +```bash +# Start server +openhands-agent-server --port 8000 + +# Or with Docker +docker run -p 8000:8000 \ + -v /var/run/docker.sock:/var/run/docker.sock \ + ghcr.io/all-hands-ai/agent-server:latest +``` + +**Use case**: Development and testing + +### 2. Single-Server Deployment + +Deploy on one server (VPS, EC2, etc.): +```bash +# Install +pip install openhands-agent-server + +# Run with systemd/supervisor +openhands-agent-server \ + --host 0.0.0.0 \ + --port 8000 \ + --workers 4 +``` + +**Use case**: Small deployments, prototypes, MVPs + +### 3. Multi-Server Deployment + +Scale horizontally with load balancer: +``` + Load Balancer + | + +-------------+-------------+ + | | | + Server 1 Server 2 Server 3 + (Agents) (Agents) (Agents) + | | | + +-------------+-------------+ + | + Shared State Store + (Database, Redis, etc.) +``` + +**Use case**: Production SaaS, high traffic, need redundancy + +### 4. Kubernetes Deployment + +Container orchestration with Kubernetes: +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: agent-server +spec: + replicas: 3 + template: + spec: + containers: + - name: agent-server + image: ghcr.io/all-hands-ai/agent-server:latest + ports: + - containerPort: 8000 +``` + +**Use case**: Enterprise deployments, auto-scaling, high availability + +## Resource Management + +### Container Limits + +Set per-workspace resource limits: +```python +# In server configuration +WORKSPACE_CONFIG = { + "resource_limits": { + "memory": "2g", # 2GB RAM + "cpus": "2", # 2 CPU cores + "disk": "10g" # 10GB disk + }, + "timeout": 300, # 5 min timeout +} +``` + +**Why limit resources?** +- Prevent one user from consuming all resources +- Fair usage across users +- Protect server from runaway processes +- Cost control + +### Cleanup & Garbage Collection + +**Container lifecycle**: +- Containers created on first use +- Kept alive between requests (warm) +- Cleaned up after inactivity timeout +- Force cleanup on server shutdown + +**Storage management**: +- Old workspaces deleted automatically +- Disk usage monitored +- Alerts when approaching limits + +## Security Considerations + +### Multi-Tenant Isolation + +**Container isolation**: +- Each user gets separate container +- Containers can't communicate +- Network isolation (optional) +- File system isolation + +**API isolation**: +- API keys mapped to users +- Users can only access their workspaces +- Server validates all permissions + +### Input Validation + +**Server validates**: +- API request schemas +- Command injection attempts +- Path traversal attempts +- File size limits + +**Defense in depth**: +- API validation +- Container validation +- Docker security features +- OS-level security + +### Network Security + +**Best practices**: +- HTTPS only (TLS certificates) +- Firewall rules (only port 443/8000) +- Rate limiting +- DDoS protection + +**Container networking**: +```python +# Disable network for workspace +WORKSPACE_CONFIG = { + "network_mode": "none" # No network access +} + +# Or allow specific hosts +WORKSPACE_CONFIG = { + "allowed_hosts": ["api.example.com"] +} +``` + +## Monitoring & Observability + +### Health Checks + +```bash +# Simple health check +curl https://agent-server.example.com/health + +# Response +{ + "status": "healthy", + "docker": "connected", + "workspaces": 15, + "uptime": 86400 +} +``` + +### Metrics + +**Prometheus metrics**: +- Request count and latency +- Active workspaces +- Container resource usage +- Error rates + +**Logging**: +- Structured JSON logs +- Per-request tracing +- Workspace events +- Error tracking + +### Alerting + +**Alert on**: +- Server down +- High error rate +- Resource exhaustion +- Container failures + +## Client SDK + +Python SDK for interacting with Agent Server: + +```python +from openhands.client import AgentServerClient + +client = AgentServerClient( + url="https://agent-server.example.com", + api_key="your-api-key" +) + +# Create conversation +conversation = client.create_conversation() + +# Send message +response = client.send_message( + conversation_id=conversation.id, + message="Hello, agent!" +) + +# Stream responses +for event in client.stream_conversation(conversation.id): + if event.type == "message": + print(event.content) +``` + +**Client handles**: +- Authentication +- Request/response serialization +- Error handling +- Streaming +- Retries + +## Cost Considerations + +### Server Costs + +**Compute**: CPU and memory for containers +- Each active workspace = 1 container +- Typically 1-2 GB RAM per workspace +- 0.5-1 CPU core per workspace + +**Storage**: Workspace files and conversation state +- ~1-10 GB per workspace (depends on usage) +- Conversation history in database + +**Network**: API requests and responses +- Minimal (mostly text) +- Streaming adds bandwidth + +### Cost Optimization + +**1. Idle timeout**: Shutdown containers after inactivity +```python +WORKSPACE_CONFIG = { + "idle_timeout": 3600 # 1 hour +} +``` + +**2. Resource limits**: Don't over-provision +```python +WORKSPACE_CONFIG = { + "resource_limits": { + "memory": "1g", # Smaller limit + "cpus": "0.5" # Fractional CPU + } +} +``` + +**3. Shared resources**: Use single server for multiple low-traffic apps + +**4. Auto-scaling**: Scale servers based on demand + +## When to Use Agent Server + +### Use Agent Server When: + +✅ **Multi-user system**: Web app with many users +✅ **Remote clients**: Mobile app, web frontend +✅ **Centralized management**: Need to monitor all agents +✅ **Workspace isolation**: Users shouldn't interfere +✅ **SaaS product**: Building agent-as-a-service +✅ **Scaling**: Need to handle concurrent users + +**Examples**: +- Chatbot platforms +- Code assistant web apps +- Agent marketplaces +- Enterprise agent deployments + +### Use Standalone SDK When: + +✅ **Single-user**: Personal tool or script +✅ **Local execution**: Running on your machine +✅ **Full control**: Need programmatic access +✅ **Simpler deployment**: No server management +✅ **Lower latency**: No network overhead + +**Examples**: +- CLI tools +- Automation scripts +- Local development +- Desktop applications + +### Hybrid Approach + +Use SDK locally but RemoteAPIWorkspace for execution: +- Agent logic in your Python code +- Execution happens on remote server +- Best of both worlds + +## Building Custom Agent Server + +The server is extensible for custom needs: + +**Custom authentication**: +```python +from openhands.agent_server import AgentServer + +class CustomAgentServer(AgentServer): + async def authenticate(self, request): + # Custom auth logic + return await oauth_verify(request) +``` + +**Custom workspace configuration**: +```python +server = AgentServer( + workspace_factory=lambda user: DockerWorkspace( + image=f"custom-image-{user.tier}", + resource_limits=user.resource_limits + ) +) +``` + +**Custom middleware**: +```python +@server.middleware +async def logging_middleware(request, call_next): + # Custom logging + response = await call_next(request) + return response +``` + +## Next Steps + +### For Usage Examples + +- [Local Agent Server](/sdk/guides/agent-server/local-server) - Run locally +- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) - Docker setup +- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) - Remote API +- [Remote Agent Server Overview](/sdk/guides/agent-server/overview) - All options + +### For Related Architecture + +- [Workspace Architecture](/sdk/arch/workspace) - RemoteAPIWorkspace details +- [SDK Architecture](/sdk/arch/sdk) - Core framework +- [Architecture Overview](/sdk/arch/overview) - System design + +### For Implementation Details + +- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) - Server source +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples + +### Condenser +Source: https://docs.openhands.dev/sdk/arch/condenser.md + +The **Condenser** system manages conversation history compression to keep agent context within LLM token limits. It reduces long event histories into condensed summaries while preserving critical information for reasoning. For more details, read the [blog here](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). + +**Source:** [`openhands-sdk/openhands/sdk/context/condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) + +## Core Responsibilities + +The Condenser system has four primary responsibilities: + +1. **History Compression** - Reduce event lists to fit within context windows +2. **Threshold Detection** - Determine when condensation should trigger +3. **Summary Generation** - Create meaningful summaries via LLM or heuristics +4. **View Management** - Transform event history into LLM-ready views + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Interface["Abstract Interface"] + Base["CondenserBase
Abstract base"] + end + + subgraph Implementations["Concrete Implementations"] + NoOp["NoOpCondenser
No compression"] + LLM["LLMSummarizingCondenser
LLM-based"] + Pipeline["PipelineCondenser
Multi-stage"] + end + + subgraph Process["Condensation Process"] + View["View
Event history"] + Check["should_condense()?"] + Condense["get_condensation()"] + Result["View | Condensation"] + end + + subgraph Output["Condensation Output"] + CondEvent["Condensation Event
Summary metadata"] + NewView["Condensed View
Reduced tokens"] + end + + Base --> NoOp + Base --> LLM + Base --> Pipeline + + View --> Check + Check -->|Yes| Condense + Check -->|No| Result + Condense --> CondEvent + CondEvent --> NewView + NewView --> Result + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class LLM,Pipeline secondary + class Check,Condense tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`CondenserBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Abstract interface | Defines `condense()` contract | +| **[`RollingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Rolling window base | Implements threshold-based triggering | +| **[`LLMSummarizingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/llm_summarizing_condenser.py)** | LLM summarization | Uses LLM to generate summaries | +| **[`NoOpCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/no_op_condenser.py)** | No-op implementation | Returns view unchanged | +| **[`PipelineCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/pipeline_condenser.py)** | Multi-stage pipeline | Chains multiple condensers | +| **[`View`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)** | Event view | Represents history for LLM | +| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation event | Metadata about compression | + +## Condenser Types + +### NoOpCondenser + +Pass-through condenser that performs no compression: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + View["View"] + NoOp["NoOpCondenser"] + Same["Same View"] + + View --> NoOp --> Same + + style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +### LLMSummarizingCondenser + +Uses an LLM to generate summaries of conversation history: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + View["Long View
120+ events"] + Check["Threshold
exceeded?"] + Summarize["LLM Summarization"] + Summary["Summary Text"] + Metadata["Condensation Event"] + AddToHistory["Add to History"] + NextStep["Next Step: View.from_events()"] + NewView["Condensed View"] + + View --> Check + Check -->|Yes| Summarize + Summarize --> Summary + Summary --> Metadata + Metadata --> AddToHistory + AddToHistory --> NextStep + NextStep --> NewView + + style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Summarize fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style NewView fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Process:** +1. **Check Threshold:** Compare view size to configured limit (e.g., event count > `max_size`) +2. **Select Events:** Identify events to keep (first N + last M) and events to summarize (middle) +3. **LLM Call:** Generate summary of middle events using dedicated LLM +4. **Create Event:** Wrap summary in `Condensation` event with `forgotten_event_ids` +5. **Add to History:** Agent adds `Condensation` to event log and returns early +6. **Next Step:** `View.from_events()` filters forgotten events and inserts summary + +**Configuration:** +- **`max_size`:** Event count threshold before condensation triggers (default: 120) +- **`keep_first`:** Number of initial events to preserve verbatim (default: 4) +- **`llm`:** LLM instance for summarization (often cheaper model than reasoning LLM) + +### PipelineCondenser + +Chains multiple condensers in sequence: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + View["Original View"] + C1["Condenser 1"] + C2["Condenser 2"] + C3["Condenser 3"] + Final["Final View"] + + View --> C1 --> C2 --> C3 --> Final + + style C1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style C2 fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style C3 fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Use Case:** Multi-stage compression (e.g., remove old events, then summarize, then truncate) + +## Condensation Flow + +### Trigger Mechanisms + +Condensers can be triggered in two ways: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Automatic["Automatic Trigger"] + Agent1["Agent Step"] + Build1["View.from_events()"] + Check1["condenser.condense(view)"] + Trigger1["should_condense()?"] + end + + Agent1 --> Build1 --> Check1 --> Trigger1 + + style Check1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Automatic Trigger:** +- **When:** Threshold exceeded (e.g., event count > `max_size`) +- **Who:** Agent calls `condenser.condense()` each step +- **Purpose:** Proactively keep context within limits + + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Manual["Manual Trigger"] + Error["LLM Context Error"] + Request["CondensationRequest Event"] + NextStep["Next Agent Step"] + Trigger2["condense() detects request"] + end + + Error --> Request --> NextStep --> Trigger2 + + style Request fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` +**Manual Trigger:** +- **When:** `CondensationRequest` event added to history (via `view.unhandled_condensation_request`) +- **Who:** Agent (on LLM context window error) or application code +- **Purpose:** Force compression when context limit exceeded + +### Condensation Workflow + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["Agent calls condense(view)"] + + Decision{"should_condense?"} + + ReturnView["Return View
Agent proceeds"] + + Extract["Select Events to Keep/Forget"] + Generate["LLM Generates Summary"] + Create["Create Condensation Event"] + ReturnCond["Return Condensation"] + AddHistory["Agent adds to history"] + NextStep["Next Step: View.from_events()"] + FilterEvents["Filter forgotten events"] + InsertSummary["Insert summary at offset"] + NewView["New condensed view"] + + Start --> Decision + Decision -->|No| ReturnView + Decision -->|Yes| Extract + Extract --> Generate + Generate --> Create + Create --> ReturnCond + ReturnCond --> AddHistory + AddHistory --> NextStep + NextStep --> FilterEvents + FilterEvents --> InsertSummary + InsertSummary --> NewView + + style Decision fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Generate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Create fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Key Steps:** + +1. **Threshold Check:** `should_condense()` determines if condensation needed +2. **Event Selection:** Identify events to keep (head + tail) vs forget (middle) +3. **Summary Generation:** LLM creates compressed representation of forgotten events +4. **Condensation Creation:** Create `Condensation` event with `forgotten_event_ids` and summary +5. **Return to Agent:** Condenser returns `Condensation` (not `View`) +6. **History Update:** Agent adds `Condensation` to event log and exits step +7. **Next Step:** `View.from_events()` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)) processes Condensation to filter events and insert summary + +## View and Condensation + +### View Structure + +A `View` represents the conversation history as it will be sent to the LLM: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Events["Full Event List
+ Condensation events"] + FromEvents["View.from_events()"] + Filter["Filter forgotten events"] + Insert["Insert summary"] + View["View
LLMConvertibleEvents"] + Convert["events_to_messages()"] + LLM["LLM Input"] + + Events --> FromEvents + FromEvents --> Filter + Filter --> Insert + Insert --> View + View --> Convert + Convert --> LLM + + style View fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style FromEvents fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**View Components:** +- **`events`:** List of `LLMConvertibleEvent` objects (filtered by Condensation) +- **`unhandled_condensation_request`:** Flag for pending manual condensation +- **`condensations`:** List of all Condensation events processed +- **Methods:** `from_events()` creates view from raw events, handling Condensation semantics + +### Condensation Event + +When condensation occurs, a `Condensation` event is created: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Old["Middle Events
~60 events"] + Summary["Summary Text
LLM-generated"] + Event["Condensation Event
forgotten_event_ids"] + Applied["View.from_events()"] + New["New View
~60 events + summary"] + + Old -.->|Summarized| Summary + Summary --> Event + Event --> Applied + Applied --> New + + style Event fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Summary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Condensation Fields:** +- **`forgotten_event_ids`:** List of event IDs to filter out +- **`summary`:** Compressed text representation of forgotten events +- **`summary_offset`:** Index where summary event should be inserted +- Inherits from `Event`: `id`, `timestamp`, `source` + +## Rolling Window Pattern + +`RollingCondenser` implements a common pattern for threshold-based condensation: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + View["Current View
120+ events"] + Check["Count Events"] + + Compare{"Count >
max_size?"} + + Keep["Keep All Events"] + + Split["Split Events"] + Head["Head
First 4 events"] + Middle["Middle
~56 events"] + Tail["Tail
~56 events"] + Summarize["LLM Summarizes Middle"] + Result["Head + Summary + Tail
~60 events total"] + + View --> Check + Check --> Compare + + Compare -->|Under| Keep + Compare -->|Over| Split + + Split --> Head + Split --> Middle + Split --> Tail + + Middle --> Summarize + Head --> Result + Summarize --> Result + Tail --> Result + + style Compare fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Split fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Summarize fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Rolling Window Strategy:** +1. **Keep Head:** Preserve first `keep_first` events (default: 4) - usually system prompts +2. **Keep Tail:** Preserve last `target_size - keep_first - 1` events - recent context +3. **Summarize Middle:** Compress events between head and tail into summary +4. **Target Size:** After condensation, view has `max_size // 2` events (default: 60) + +## Component Relationships + +### How Condenser Integrates + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Agent["Agent"] + Condenser["Condenser"] + State["Conversation State"] + Events["Event Log"] + + Agent -->|"View.from_events()"| State + State -->|View| Agent + Agent -->|"condense(view)"| Condenser + Condenser -->|"View | Condensation"| Agent + Agent -->|Adds Condensation| Events + + style Condenser fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Agent → State**: Calls `View.from_events()` to get current view +- **Agent → Condenser**: Calls `condense(view)` each step if condenser registered +- **Condenser → Agent**: Returns `View` (proceed) or `Condensation` (defer) +- **Agent → Events**: Adds `Condensation` event to log when returned + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents use condensers during reasoning +- **[Conversation Architecture](/sdk/arch/conversation)** - View generation and event management +- **[Events](/sdk/arch/events)** - Condensation event type and append-only log +- **[Context Condenser Guide](/sdk/guides/context-condenser)** - Configuring and using condensers + +### Conversation +Source: https://docs.openhands.dev/sdk/arch/conversation.md + +The **Conversation** component orchestrates agent execution through structured message flows and state management. It serves as the primary interface for interacting with agents, managing their lifecycle from initialization to completion. + +**Source:** [`openhands-sdk/openhands/sdk/conversation/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/conversation) + +## Core Responsibilities + +The Conversation system has four primary responsibilities: + +1. **Agent Lifecycle Management** - Initialize, run, pause, and terminate agents +2. **State Orchestration** - Maintain conversation history, events, and execution status +3. **Workspace Coordination** - Bridge agent operations with execution environments +4. **Runtime Services** - Provide persistence, monitoring, security, and visualization + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +flowchart LR + User["User Code"] + + subgraph Factory[" "] + Entry["Conversation()"] + end + + subgraph Implementations[" "] + Local["LocalConversation
Direct execution"] + Remote["RemoteConversation
Via agent-server API"] + end + + subgraph Core[" "] + State["ConversationState
• agent
workspace • stats • ..."] + EventLog["ConversationState.events
Event storage"] + end + + User --> Entry + Entry -.->|LocalWorkspace| Local + Entry -.->|RemoteWorkspace| Remote + + Local --> State + Remote --> State + + State --> EventLog + + classDef factory fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef impl fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef core fill:#fff4df,stroke:#b7791f,stroke-width:2px + classDef service fill:#e9f9ef,stroke:#2f855a,stroke-width:1.5px + + class Entry factory + class Local,Remote impl + class State,EventLog core + class Persist,Stuck,Viz,Secrets service +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py)** | Unified entrypoint | Returns correct implementation based on workspace type | +| **[`LocalConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py)** | Local execution | Runs agent directly in process | +| **[`RemoteConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py)** | Remote execution | Delegates to agent-server via HTTP/WebSocket | +| **[`ConversationState`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | State container | Pydantic model with validation and serialization | +| **[`EventLog`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Event storage | Immutable append-only store with efficient queries | + +## Factory Pattern + +The [`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py) class automatically selects the correct implementation based on workspace type: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Input["Conversation(agent, workspace)"] + Check{Workspace Type?} + Local["LocalConversation
Agent runs in-process"] + Remote["RemoteConversation
Agent runs via API"] + + Input --> Check + Check -->|str or LocalWorkspace| Local + Check -->|RemoteWorkspace| Remote + + style Input fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Remote fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Dispatch Logic:** +- **Local:** String paths or `LocalWorkspace` → in-process execution +- **Remote:** `RemoteWorkspace` → agent-server via HTTP/WebSocket + +This abstraction enables switching deployment modes without code changes—just swap the workspace type. + +## State Management + +State updates follow a **two-path pattern** depending on the type of change: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["State Update Request"] + Lock["Acquire FIFO Lock"] + Decision{New Event?} + + StateOnly["Update State Fields
stats, status, metadata"] + EventPath["Append to Event Log
messages, actions, observations"] + + Callback["Trigger Callbacks"] + Release["Release Lock"] + + Start --> Lock + Lock --> Decision + Decision -->|No| StateOnly + Decision -->|Yes| EventPath + StateOnly --> Callback + EventPath --> Callback + Callback --> Release + + style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px + style EventPath fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style StateOnly fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Two Update Patterns:** + +1. **State-Only Updates** - Modify fields without appending events (e.g., status changes, stat increments) +2. **Event-Based Updates** - Append to event log when new messages, actions, or observations occur + +**Thread Safety:** +- FIFO Lock ensures ordered, atomic updates +- Callbacks fire after successful commit +- Read operations never block writes + +## Execution Models + +The conversation system supports two execution models with identical APIs: + +### Local vs Remote Execution + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Local["LocalConversation"] + L1["User sends message"] + L2["Agent executes in-process"] + L3["Direct tool calls"] + L4["Events via callbacks"] + L1 --> L2 --> L3 --> L4 + end + style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Remote["RemoteConversation"] + R1["User sends message"] + R2["HTTP → Agent Server"] + R3["Isolated container execution"] + R4["WebSocket event stream"] + R1 --> R2 --> R3 --> R4 + end + style Remote fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +| Aspect | LocalConversation | RemoteConversation | +|--------|-------------------|-------------------| +| **Execution** | In-process | Remote container/server | +| **Communication** | Direct function calls | HTTP + WebSocket | +| **State Sync** | Immediate | Network serialized | +| **Use Case** | Development, CLI tools | Production, web apps | +| **Isolation** | Process-level | Container-level | + +**Key Insight:** Same API surface means switching between local and remote requires only changing workspace type—no code changes. + +## Auxiliary Services + +The conversation system provides pluggable services that operate independently on the event stream: + +| Service | Purpose | Architecture Pattern | +|---------|---------|---------------------| +| **[Event Log](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Append-only immutable storage | Event sourcing with indexing | +| **[Persistence](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | Auto-save & resume | Debounced writes, incremental events | +| **[Stuck Detection](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py)** | Loop prevention | Sliding window pattern matching | +| **[Visualization](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/visualizer/)** | Execution diagrams | Event stream → visual representation | +| **[Secret Registry](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/secret_registry.py)** | Secure value storage | Memory-only with masked logging | + +**Design Principle:** Services read from the event log but never mutate state directly. This enables: +- Services can be enabled/disabled independently +- Easy to add new services without changing core orchestration +- Event stream acts as the integration point + +## Component Relationships + +### How Conversation Interacts + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Conv["Conversation"] + Agent["Agent"] + WS["Workspace"] + Tools["Tools"] + LLM["LLM"] + + Conv -->|Delegates to| Agent + Conv -->|Configures| WS + Agent -.->|Updates| Conv + Agent -->|Uses| Tools + Agent -->|Queries| LLM + + style Conv fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style WS fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Conversation → Agent**: One-way orchestration, agent reports back via state updates +- **Conversation → Workspace**: Configuration only, workspace doesn't know about conversation +- **Agent → Conversation**: Indirect via state events + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - Agent reasoning loop design +- **[Workspace Architecture](/sdk/arch/workspace)** - Execution environment design +- **[Event System](/sdk/arch/events)** - Event types and flow +- **[Conversation Usage Guide](/sdk/guides/convo-persistence)** - Practical examples + +### Design Principles +Source: https://docs.openhands.dev/sdk/arch/design.md + +The **OpenHands Software Agent SDK** is part of the [OpenHands V1](https://openhands.dev/blog/the-path-to-openhands-v1) effort — a complete architectural rework based on lessons from **OpenHands V0**, one of the most widely adopted open-source coding agents. + +[Over the last eighteen months](https://openhands.dev/blog/one-year-of-openhands-a-journey-of-open-source-ai-development), OpenHands V0 evolved from a scrappy prototype into a widely used open-source coding agent. The project grew to tens of thousands of GitHub stars, hundreds of contributors, and multiple production deployments. That growth exposed architectural tensions — tight coupling between research and production, mandatory sandboxing, mutable state, and configuration sprawl — which informed the design principles of agent-sdk in V1. + +## Optional Isolation over Mandatory Sandboxing + + +**V0 Challenge:** +Every tool call in V0 executed in a sandboxed Docker container by default. While this guaranteed reproducibility and security, it also created friction — the agent and sandbox ran as separate processes, states diverged easily, and multi-tenant workloads could crash each other. +Moreover, with the rise of the Model Context Protocol (MCP), which assumes local execution and direct access to user environments, V0's rigid isolation model became incompatible. + + +**V1 Principle:** +**Sandboxing should be opt-in, not universal.** +V1 unifies agent and tool execution within a single process by default, aligning with MCP's local-execution model. +When isolation is needed, the same stack can be transparently containerized, maintaining flexibility without complexity. + +## Stateless by Default, One Source of Truth for State + + +**V0 Challenge:** +V0 relied on mutable Python objects and dynamic typing, which led to silent inconsistencies — failed session restores, version drift, and non-deterministic behavior. Each subsystem tracked its own transient state, making debugging and recovery painful. + + +**V1 Principle:** +**Keep everything stateless, with exactly one mutable state.** +All components (agents, tools, LLMs, and configurations) are immutable Pydantic models validated at construction. +The only mutable entity is the [conversation state](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py), a single source of truth that enables deterministic replay and robust persistence across sessions or distributed systems. + +## Clear Boundaries between Agent and Applications + + +**V0 Challenge:** +The same codebase powered the CLI, web interface, and integrations (e.g., Github, Gitlab, etc). Over time, application-specific conditionals and prompts polluted the agent core, making it brittle. +Heavy research dependencies and benchmark integrations further bloated production builds. + + +**V1 Principle:** +**Maintain strict separation of concerns.** +V1 divides the system into stable, isolated layers: the [SDK (agent core)](/sdk/arch/overview#1-sdk-%E2%80%93-openhands-sdk), [tools (set of tools)](/sdk/arch/overview#2-tools-%E2%80%93-openhands-tools), [workspace (sandbox)](/sdk/arch/overview#3-workspace-%E2%80%93-openhands-workspace), and [agent server (server that runs inside sandbox)](/sdk/arch/overview#4-agent-server-%E2%80%93-openhands-agent-server). +Applications communicate with the agent via APIs rather than embedding it directly, ensuring research and production can evolve independently. + + +## Composable Components for Extensibility + + +**V0 Challenge:** +Because agent logic was hard-coded into the core application, extending behavior (e.g., adding new tools or entry points) required branching logic for different entrypoints. This rigidity limited experimentation and discouraged contributions. + + +**V1 Principle:** +**Everything should be composable and safe to extend.** +Agents are defined as graphs of interchangeable components—tools, prompts, LLMs, and contexts—each described declaratively with strong typing. +Developers can reconfigure capabilities (e.g., swap toolsets, override prompts, add delegation logic) without modifying core code, preserving stability while fostering rapid innovation. + +### Events +Source: https://docs.openhands.dev/sdk/arch/events.md + +The **Event System** provides an immutable, type-safe event framework that drives agent execution and state management. Events form an append-only log that serves as both the agent's memory and the integration point for auxiliary services. + +**Source:** [`openhands-sdk/openhands/sdk/event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) + +## Core Responsibilities + +The Event System has four primary responsibilities: + +1. **Type Safety** - Enforce event schemas through Pydantic models +2. **LLM Integration** - Convert events to/from LLM message formats +3. **Append-Only Log** - Maintain immutable event history +4. **Service Integration** - Enable observers to react to event streams + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 80}} }%% +flowchart TB + Base["Event
Base class"] + LLMBase["LLMConvertibleEvent
Abstract base"] + + subgraph LLMTypes["LLM-Convertible Events
Visible to the LLM"] + Message["MessageEvent
User/assistant text"] + Action["ActionEvent
Tool calls"] + System["SystemPromptEvent
Initial system prompt"] + CondSummary["CondensationSummaryEvent
Condenser summary"] + + ObsBase["ObservationBaseEvent
Base for tool responses"] + Observation["ObservationEvent
Tool results"] + UserReject["UserRejectObservation
User rejected action"] + AgentError["AgentErrorEvent
Agent error"] + end + + subgraph Internals["Internal Events
NOT visible to the LLM"] + ConvState["ConversationStateUpdateEvent
State updates"] + CondReq["CondensationRequest
Request compression"] + Cond["Condensation
Compression result"] + Pause["PauseEvent
User pause"] + end + + Base --> LLMBase + Base --> Internals + LLMBase --> LLMTypes + ObsBase --> Observation + ObsBase --> UserReject + ObsBase --> AgentError + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base,LLMBase,Message,Action,SystemPromptEvent primary + class ObsBase,Observation,UserReject,AgentError secondary + class ConvState,CondReq,Cond,Pause tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Event`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | Base event class | Immutable Pydantic model with ID, timestamp, source | +| **[`LLMConvertibleEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | LLM-compatible events | Abstract class with `to_llm_message()` method | +| **[`MessageEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/message.py)** | Text messages | User or assistant conversational messages with skills | +| **[`ActionEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/action.py)** | Tool calls | Agent tool invocations with thought, reasoning, security risk | +| **[`ObservationBaseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool response base | Base for all tool call responses | +| **[`ObservationEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool results | Successful tool execution outcomes | +| **[`UserRejectObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | User rejection | User rejected action in confirmation mode | +| **[`AgentErrorEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Agent errors | Errors from agent/scaffold (not model output) | +| **[`SystemPromptEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/system.py)** | System context | System prompt with tool schemas | +| **[`CondensationSummaryEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condenser summary | LLM-convertible summary of forgotten events | +| **[`ConversationStateUpdateEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py)** | State updates | Key-value conversation state changes | +| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation result | Events being forgotten with optional summary | +| **[`CondensationRequest`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Request compression | Trigger for conversation history compression | +| **[`PauseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/user_action.py)** | User pause | User requested pause of agent execution | + +## Event Types + +### LLM-Convertible Events + +Events that participate in agent reasoning and can be converted to LLM messages: + + +| Event Type | Source | Content | LLM Role | +|------------|--------|---------|----------| +| **MessageEvent (user)** | user | Text, images | `user` | +| **MessageEvent (agent)** | agent | Text reasoning, skills | `assistant` | +| **ActionEvent** | agent | Tool call with thought, reasoning, security risk | `assistant` with `tool_calls` | +| **ObservationEvent** | environment | Tool execution result | `tool` | +| **UserRejectObservation** | environment | Rejection reason | `tool` | +| **AgentErrorEvent** | agent | Error details | `tool` | +| **SystemPromptEvent** | agent | System prompt with tool schemas | `system` | +| **CondensationSummaryEvent** | environment | Summary of forgotten events | `user` | + +The event system bridges agent events to LLM messages: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Events["Event List"] + Filter["Filter LLMConvertibleEvent"] + Group["Group ActionEvents
by llm_response_id"] + Convert["Convert to Messages"] + LLM["LLM Input"] + + Events --> Filter + Filter --> Group + Group --> Convert + Convert --> LLM + + style Filter fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Group fill:#fff4df,stroke:#b7791f,stroke-width:2px + style Convert fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Special Handling - Parallel Function Calling:** + +When multiple `ActionEvent`s share the same `llm_response_id` (parallel function calling): +1. Group all ActionEvents by `llm_response_id` +2. Combine into single Message with multiple `tool_calls` +3. Only first event's `thought`, `reasoning_content`, and `thinking_blocks` are included +4. All subsequent events in the batch have empty thought fields + +**Example:** +``` +ActionEvent(llm_response_id="abc123", thought="Let me check...", tool_call=tool1) +ActionEvent(llm_response_id="abc123", thought=[], tool_call=tool2) +→ Combined into single Message(role="assistant", content="Let me check...", tool_calls=[tool1, tool2]) +``` + + +### Internal Events + +Events for metadata, control flow, and user actions (not sent to LLM): + +| Event Type | Source | Purpose | Key Fields | +|------------|--------|---------|------------| +| **ConversationStateUpdateEvent** | environment | State synchronization | `key` (field name), `value` (serialized data) | +| **CondensationRequest** | environment | Trigger history compression | Signal to condenser when context window exceeded | +| **Condensation** | environment | Compression result | `forgotten_event_ids`, `summary`, `summary_offset` | +| **PauseEvent** | user | User pause action | Indicates agent execution was paused by user | + +**Source Types:** +- **user**: Event originated from user input +- **agent**: Event generated by agent logic +- **environment**: Event from system/framework/tools + +## Component Relationships + +### How Events Integrate + +## `source` vs LLM `role` + +Events often carry **two different concepts** that are easy to confuse: + +- **`Event.source`**: where the event *originated* (`user`, `agent`, or `environment`). This is about attribution. +- **LLM `role`** (e.g. `Message.role` / `MessageEvent.llm_message.role`): how the event should be represented to the LLM (`system`, `user`, `assistant`, `tool`). This is about LLM formatting. + +These fields are **intentionally independent**. + +Common examples include: + +- **Observations**: tool results are typically `source="environment"` and represented to the LLM with `role="tool"`. +- **Synthetic framework messages**: the SDK may inject feedback or control messages (e.g. from hooks) as `source="environment"` while still using an LLM `role="user"` so the agent reads it as a user-facing instruction. + +**Do not infer event origin from LLM role.** If you need to distinguish real user input from synthetic/framework messages, rely on `Event.source` (and any explicit metadata fields on the event), not the LLM role. + + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Events["Event System"] + Agent["Agent"] + Conversation["Conversation"] + Tools["Tools"] + Services["Auxiliary Services"] + + Agent -->|Reads| Events + Agent -->|Writes| Events + Conversation -->|Manages| Events + Tools -->|Creates| Events + Events -.->|Stream| Services + + style Events fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Agent → Events**: Reads history for context, writes actions/messages +- **Conversation → Events**: Owns and persists event log +- **Tools → Events**: Create ObservationEvents after execution +- **Services → Events**: Read-only observers for monitoring, visualization + +## Error Events: Agent vs Conversation + +Two distinct error events exist in the SDK, with different purpose and visibility: + +- AgentErrorEvent + - Type: ObservationBaseEvent (LLM-convertible) + - Scope: Error for a specific tool call (has tool_name and tool_call_id) + - Source: "agent" + - LLM visibility: Sent as a tool message so the model can react/recover + - Effect: Conversation continues; not a terminal state + - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py + +- ConversationErrorEvent + - Type: Event (not LLM-convertible) + - Scope: Conversation-level runtime failure (no tool_name/tool_call_id) + - Source: typically "environment" + - LLM visibility: Not sent to the model + - Effect: Run loop transitions to ERROR and run() raises ConversationRunError; surface top-level error to client applications + - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_error.py + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents read and write events +- **[Conversation Architecture](/sdk/arch/conversation)** - Event log management +- **[Tool System](/sdk/arch/tool-system)** - ActionEvent and ObservationEvent generation +- **[Condenser](/sdk/arch/condenser)** - Event history compression + +### LLM +Source: https://docs.openhands.dev/sdk/arch/llm.md + +The **LLM** system provides a unified interface to language model providers through LiteLLM. It handles model configuration, request orchestration, retry logic, telemetry, and cost tracking across all providers. + +**Source:** [`openhands-sdk/openhands/sdk/llm/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/llm) + +## Core Responsibilities + +The LLM system has five primary responsibilities: + +1. **Provider Abstraction** - Uniform interface to OpenAI, Anthropic, Google, and 100+ providers +2. **Request Pipeline** - Dual API support: Chat Completions (`completion()`) and Responses API (`responses()`) +3. **Configuration Management** - Load from environment, JSON, or programmatic configuration +4. **Telemetry & Cost** - Track usage, latency, and costs across providers +5. **Enhanced Reasoning** - Support for OpenAI Responses API with encrypted thinking and reasoning summaries + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 70}} }%% +flowchart TB + subgraph Configuration["Configuration Sources"] + Env["Environment Variables
LLM_MODEL, LLM_API_KEY"] + JSON["JSON Files
config/llm.json"] + Code["Programmatic
LLM(...)"] + end + + subgraph Core["Core LLM"] + Model["LLM Model
Pydantic configuration"] + Pipeline["Request Pipeline
Retry, timeout, telemetry"] + end + + subgraph Backend["LiteLLM Backend"] + Providers["100+ Providers
OpenAI, Anthropic, etc."] + end + + subgraph Output["Telemetry"] + Usage["Token Usage"] + Cost["Cost Tracking"] + Latency["Latency Metrics"] + end + + Env --> Model + JSON --> Model + Code --> Model + + Model --> Pipeline + Pipeline --> Providers + + Pipeline --> Usage + Pipeline --> Cost + Pipeline --> Latency + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Model primary + class Pipeline secondary + class LiteLLM tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`LLM`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Configuration model | Pydantic model with provider settings | +| **[`completion()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Chat Completions API | Handles retries, timeouts, streaming | +| **[`responses()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Responses API | Enhanced reasoning with encrypted thinking | +| **[`LiteLLM`](https://github.com/BerriAI/litellm)** | Provider adapter | Unified API for 100+ providers | +| **Configuration Loaders** | Config hydration | `load_from_env()`, `load_from_json()` | +| **Telemetry** | Usage tracking | Token counts, costs, latency | + +## Configuration + +See [`LLM` source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py) for complete list of supported fields. + +### Programmatic Configuration + +Create LLM instances directly in code: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Code["Python Code"] + LLM["LLM(model=...)"] + Agent["Agent"] + + Code --> LLM + LLM --> Agent + + style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Example:** +```python +from pydantic import SecretStr +from openhands.sdk import LLM + +llm = LLM( + model="anthropic/claude-sonnet-4.1", + api_key=SecretStr("sk-ant-123"), + temperature=0.1, + timeout=120, +) +``` + +### Environment Variable Configuration + +Load from environment using naming convention: + +**Environment Variable Pattern:** +- **Prefix:** All variables start with `LLM_` +- **Mapping:** `LLM_FIELD` → `field` (lowercased) +- **Types:** Auto-cast to int, float, bool, JSON, or SecretStr + +**Common Variables:** +```bash +export LLM_MODEL="anthropic/claude-sonnet-4.1" +export LLM_API_KEY="sk-ant-123" +export LLM_USAGE_ID="primary" +export LLM_TIMEOUT="120" +export LLM_NUM_RETRIES="5" +``` + +### JSON Configuration + +Serialize and load from JSON files: + +**Example:** +```python +# Save +llm.model_dump_json(exclude_none=True, indent=2) + +# Load +llm = LLM.load_from_json("config/llm.json") +``` + +**Security:** Secrets are redacted in serialized JSON (combine with environment variables for sensitive data). +If you need to include secrets in JSON, use `llm.model_dump_json(exclude_none=True, context={"expose_secrets": True})`. + + +## Request Pipeline + +### Completion Flow + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 20}} }%% +flowchart TB + Request["completion() or responses() call"] + Validate["Validate Config"] + + Attempt["LiteLLM Request"] + Success{"Success?"} + + Retry{"Retries
remaining?"} + Wait["Exponential Backoff"] + + Telemetry["Record Telemetry"] + Response["Return Response"] + Error["Raise Error"] + + Request --> Validate + Validate --> Attempt + Attempt --> Success + + Success -->|Yes| Telemetry + Success -->|No| Retry + + Retry -->|Yes| Wait + Retry -->|No| Error + + Wait --> Attempt + Telemetry --> Response + + style Attempt fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Retry fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Telemetry fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Pipeline Stages:** + +1. **Validation:** Check required fields (model, messages) +2. **Request:** Call LiteLLM with provider-specific formatting +3. **Retry Logic:** Exponential backoff on failures (configurable) +4. **Telemetry:** Record tokens, cost, latency +5. **Response:** Return completion or raise error + +### Responses API Support + +In addition to the standard chat completion API, the LLM system supports [OpenAI's Responses API](https://platform.openai.com/docs/api-reference/responses) as an alternative invocation path for models that benefit from this newer interface (e.g., GPT-5-Codex only supports Responses API). The Responses API provides enhanced reasoning capabilities with encrypted thinking and detailed reasoning summaries. + +#### Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Check{"Model supports
Responses API?"} + + subgraph Standard["Standard Path"] + ChatFormat["Format as
Chat Messages"] + ChatCall["litellm.completion()"] + end + + subgraph ResponsesPath["Responses Path"] + RespFormat["Format as
instructions + input[]"] + RespCall["litellm.responses()"] + end + + ChatResponse["ModelResponse"] + RespResponse["ResponsesAPIResponse"] + + Parse["Parse to Message"] + Return["LLMResponse"] + + Check -->|No| ChatFormat + Check -->|Yes| RespFormat + + ChatFormat --> ChatCall + RespFormat --> RespCall + + ChatCall --> ChatResponse + RespCall --> RespResponse + + ChatResponse --> Parse + RespResponse --> Parse + + Parse --> Return + + style RespFormat fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style RespCall fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +#### Supported Models + +Models that automatically use the Responses API path: + +| Pattern | Examples | Documentation | +|---------|----------|---------------| +| **gpt-5*** | `gpt-5`, `gpt-5-mini`, `gpt-5-codex` | OpenAI GPT-5 family | + +**Detection:** The SDK automatically detects if a model supports the Responses API using pattern matching in [`model_features.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/model_features.py). + + +## Provider Integration + +### LiteLLM Abstraction + +Software Agent SDK uses LiteLLM for provider abstraction: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + SDK["Software Agent SDK"] + LiteLLM["LiteLLM"] + + subgraph Providers["100+ Providers"] + OpenAI["OpenAI"] + Anthropic["Anthropic"] + Google["Google"] + Azure["Azure"] + Others["..."] + end + + SDK --> LiteLLM + LiteLLM --> OpenAI + LiteLLM --> Anthropic + LiteLLM --> Google + LiteLLM --> Azure + LiteLLM --> Others + + style LiteLLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style SDK fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Benefits:** +- **100+ Providers:** OpenAI, Anthropic, Google, Azure, AWS Bedrock, local models, etc. +- **Unified API:** Same interface regardless of provider +- **Format Translation:** Provider-specific request/response formatting +- **Error Handling:** Normalized error codes and messages + +### LLM Providers + +Provider integrations remain shared between the Software Agent SDK and the OpenHands Application. +The pages linked below live under the OpenHands app section but apply +verbatim to SDK applications because both layers wrap the same +`openhands.sdk.llm.LLM` interface. + +| Provider / scenario | Documentation | +| --- | --- | +| OpenHands hosted models | [/openhands/usage/llms/openhands-llms](/openhands/usage/llms/openhands-llms) | +| OpenAI | [/openhands/usage/llms/openai-llms](/openhands/usage/llms/openai-llms) | +| Azure OpenAI | [/openhands/usage/llms/azure-llms](/openhands/usage/llms/azure-llms) | +| Google Gemini / Vertex | [/openhands/usage/llms/google-llms](/openhands/usage/llms/google-llms) | +| Groq | [/openhands/usage/llms/groq](/openhands/usage/llms/groq) | +| OpenRouter | [/openhands/usage/llms/openrouter](/openhands/usage/llms/openrouter) | +| Moonshot | [/openhands/usage/llms/moonshot](/openhands/usage/llms/moonshot) | +| LiteLLM proxy | [/openhands/usage/llms/litellm-proxy](/openhands/usage/llms/litellm-proxy) | +| Local LLMs (Ollama, SGLang, vLLM, LM Studio) | [/openhands/usage/llms/local-llms](/openhands/usage/llms/local-llms) | +| Custom LLM configurations | [/openhands/usage/llms/custom-llm-configs](/openhands/usage/llms/custom-llm-configs) | + +When you follow any of those guides while building with the SDK, create an +`LLM` object using the documented parameters (for example, API keys, base URLs, +or custom headers) and pass it into your agent or registry. The OpenHands UI +surfacing is simply a convenience layer on top of the same configuration model. + + +## Telemetry and Cost Tracking + +### Telemetry Collection + +LLM requests automatically collect metrics: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Request["LLM Request"] + + subgraph Metrics + Tokens["Token Counts
Input/Output"] + Cost["Cost
USD"] + Latency["Latency
ms"] + end + + Events["Event Log"] + + Request --> Tokens + Request --> Cost + Request --> Latency + + Tokens --> Events + Cost --> Events + Latency --> Events + + style Metrics fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Tracked Metrics:** +- **Token Usage:** Input tokens, output tokens, total +- **Cost:** Per-request cost using configured rates +- **Latency:** Request duration in milliseconds +- **Errors:** Failure types and retry counts + +### Cost Configuration + +Configure per-token costs for custom models: + +```python +llm = LLM( + model="custom/my-model", + input_cost_per_token=0.00001, # $0.01 per 1K tokens + output_cost_per_token=0.00003, # $0.03 per 1K tokens +) +``` + +**Built-in Costs:** LiteLLM includes costs for major providers (updated regularly, [link](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)) + +**Custom Costs:** Override for: +- Internal models +- Custom pricing agreements +- Cost estimation for budgeting + +## Component Relationships + +### How LLM Integrates + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + LLM["LLM"] + Agent["Agent"] + Conversation["Conversation"] + Events["Events"] + Security["Security Analyzer"] + Condenser["Context Condenser"] + + Agent -->|Uses| LLM + LLM -->|Records| Events + Security -.->|Optional| LLM + Condenser -.->|Optional| LLM + Conversation -->|Provides context| Agent + + style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Agent → LLM**: Agent uses LLM for reasoning and tool calls +- **LLM → Events**: LLM requests/responses recorded as events +- **Security → LLM**: Optional security analyzer can use separate LLM +- **Condenser → LLM**: Optional context condenser can use separate LLM +- **Configuration**: LLM configured independently, passed to agent +- **Telemetry**: LLM metrics flow through event system to UI/logging + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents use LLMs for reasoning and perform actions +- **[Events](/sdk/arch/events)** - LLM request/response event types +- **[Security](/sdk/arch/security)** - Optional LLM-based security analysis +- **[Provider Setup Guides](/openhands/usage/llms/openai-llms)** - Provider-specific configuration + +### MCP Integration +Source: https://docs.openhands.dev/sdk/arch/mcp.md + +The **MCP Integration** system enables agents to use external tools via the Model Context Protocol (MCP). It provides a bridge between MCP servers and the Software Agent SDK's tool system, supporting both synchronous and asynchronous execution. + +**Source:** [`openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) + +## Core Responsibilities + +The MCP Integration system has four primary responsibilities: + +1. **MCP Client Management** - Connect to and communicate with MCP servers +2. **Tool Discovery** - Enumerate available tools from MCP servers +3. **Schema Adaptation** - Convert MCP tool schemas to SDK tool definitions +4. **Execution Bridge** - Execute MCP tool calls from agent actions + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +flowchart TB + subgraph Client["MCP Client"] + Sync["MCPClient
Sync/Async bridge"] + Async["AsyncMCPClient
FastMCP base"] + end + + subgraph Bridge["Tool Bridge"] + Def["MCPToolDefinition
Schema conversion"] + Exec["MCPToolExecutor
Execution handler"] + end + + subgraph Integration["Agent Integration"] + Action["MCPToolAction
Dynamic model"] + Obs["MCPToolObservation
Result wrapper"] + end + + subgraph External["External"] + Server["MCP Server
stdio/HTTP"] + Tools["External Tools"] + end + + Sync --> Async + Async --> Server + + Server --> Def + Def --> Exec + + Exec --> Action + Action --> Server + Server --> Obs + + Server -.->|Spawns| Tools + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Sync,Async primary + class Def,Exec secondary + class Action,Obs tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`MCPClient`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py)** | Client wrapper | Extends FastMCP with sync/async bridge | +| **[`MCPToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Tool metadata | Converts MCP schemas to SDK format | +| **[`MCPToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Execution handler | Bridges agent actions to MCP calls | +| **[`MCPToolAction`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Dynamic action model | Runtime-generated Pydantic model | +| **[`MCPToolObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Result wrapper | Wraps MCP tool results | + +## MCP Client + +### Sync/Async Bridge + +The SDK's `MCPClient` extends FastMCP's async client with synchronous wrappers: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Sync["Sync Code
Agent execution"] + Bridge["call_async_from_sync()"] + Executor["AsyncExecutor
Background loop"] + Async["Async MCP Call"] + Server["MCP Server"] + Result["Result"] + + Sync --> Bridge + Bridge --> Executor + Executor --> Async + Async --> Server + Server --> Result + Result --> Sync + + style Bridge fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Executor fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Async fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Bridge Pattern:** +- **Problem:** MCP protocol is async, but agent tools run synchronously +- **Solution:** Background event loop that executes async code from sync contexts +- **Benefit:** Agents use MCP tools without async/await in tool definitions + +**Client Features:** +- **Lifecycle Management:** `__enter__`/`__exit__` for context manager +- **Timeout Support:** Configurable timeouts for MCP operations +- **Error Handling:** Wraps MCP errors in observations +- **Connection Pooling:** Reuses connections across tool calls + +### MCP Server Configuration + +MCP servers are configured using the FastMCP format: + +```python +mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "filesystem": { + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"] + } + } +} +``` + +**Configuration Fields:** +- **command:** Executable to spawn (e.g., `uvx`, `npx`, `node`) +- **args:** Arguments to pass to command +- **env:** Environment variables (optional) + +## Tool Discovery and Conversion + +### Discovery Flow + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Config["MCP Config"] + Spawn["Spawn Server"] + List["List Tools"] + + subgraph Convert["Convert Each Tool"] + Schema["MCP Schema"] + Action["Generate Action Model"] + Def["Create ToolDefinition"] + end + + Register["Register in ToolRegistry"] + + Config --> Spawn + Spawn --> List + List --> Schema + + Schema --> Action + Action --> Def + Def --> Register + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Action fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Register fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Discovery Steps:** + +1. **Spawn Server:** Launch MCP server via stdio +2. **List Tools:** Call `tools/list` MCP endpoint +3. **Parse Schemas:** Extract tool names, descriptions, parameters +4. **Generate Models:** Dynamically create Pydantic models for actions +5. **Create Definitions:** Wrap in `ToolDefinition` objects +6. **Register:** Add to agent's tool registry + +### Schema Conversion + +MCP tool schemas are converted to SDK tool definitions: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + MCP["MCP Tool Schema
JSON Schema"] + Parse["Parse Parameters"] + Model["Dynamic Pydantic Model
MCPToolAction"] + Def["ToolDefinition
SDK format"] + + MCP --> Parse + Parse --> Model + Model --> Def + + style Parse fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Model fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Conversion Rules:** + +| MCP Schema | SDK Action Model | +|------------|------------------| +| **name** | Class name (camelCase) | +| **description** | Docstring | +| **inputSchema** | Pydantic fields | +| **required** | Field(required=True) | +| **type** | Python type hints | + +**Example:** + +```python +# MCP Schema +{ + "name": "fetch_url", + "description": "Fetch content from URL", + "inputSchema": { + "type": "object", + "properties": { + "url": {"type": "string"}, + "timeout": {"type": "number"} + }, + "required": ["url"] + } +} + +# Generated Action Model +class FetchUrl(MCPToolAction): + """Fetch content from URL""" + url: str + timeout: float | None = None +``` + +## Tool Execution + +### Execution Flow + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Agent["Agent generates action"] + Action["MCPToolAction"] + Executor["MCPToolExecutor"] + + Convert["Convert to MCP format"] + Call["MCP call_tool"] + Server["MCP Server"] + + Result["MCP Result"] + Obs["MCPToolObservation"] + Return["Return to Agent"] + + Agent --> Action + Action --> Executor + Executor --> Convert + Convert --> Call + Call --> Server + Server --> Result + Result --> Obs + Obs --> Return + + style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Call fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Obs fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Execution Steps:** + +1. **Action Creation:** LLM generates tool call, parsed into `MCPToolAction` +2. **Executor Lookup:** Find `MCPToolExecutor` for tool name +3. **Format Conversion:** Convert action fields to MCP arguments +4. **MCP Call:** Execute `call_tool` via MCP client +5. **Result Parsing:** Parse MCP result (text, images, resources) +6. **Observation Creation:** Wrap in `MCPToolObservation` +7. **Error Handling:** Catch exceptions, return error observations + +### MCPToolExecutor + +Executors bridge SDK actions to MCP calls: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Executor["MCPToolExecutor"] + Client["MCP Client"] + Name["tool_name"] + + Executor -->|Uses| Client + Executor -->|Knows| Name + + style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Client fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Executor Responsibilities:** +- **Client Management:** Hold reference to MCP client +- **Tool Identification:** Know which MCP tool to call +- **Argument Conversion:** Transform action fields to MCP format +- **Result Handling:** Parse MCP responses +- **Error Recovery:** Handle connection errors, timeouts, server failures + +## MCP Tool Lifecycle + +### From Configuration to Execution + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Load["Load MCP Config"] + Start["Start Conversation"] + Spawn["Spawn MCP Servers"] + Discover["Discover Tools"] + Register["Register Tools"] + + Ready["Agent Ready"] + + Step["Agent Step"] + LLM["LLM Tool Call"] + Execute["Execute MCP Tool"] + Result["Return Observation"] + + End["End Conversation"] + Cleanup["Close MCP Clients"] + + Load --> Start + Start --> Spawn + Spawn --> Discover + Discover --> Register + Register --> Ready + + Ready --> Step + Step --> LLM + LLM --> Execute + Execute --> Result + Result --> Step + + Step --> End + End --> Cleanup + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Cleanup fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Lifecycle Phases:** + +| Phase | Operations | Components | +|-------|-----------|------------| +| **Initialization** | Spawn servers, discover tools | MCPClient, ToolRegistry | +| **Registration** | Create definitions, executors | MCPToolDefinition, MCPToolExecutor | +| **Execution** | Handle tool calls | Agent, MCPToolAction | +| **Cleanup** | Close connections, shutdown servers | MCPClient.sync_close() | + +## MCP Annotations + +MCP tools can include metadata hints for agents: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Tool["MCP Tool"] + + subgraph Annotations + ReadOnly["readOnlyHint"] + Destructive["destructiveHint"] + Progress["progressEnabled"] + end + + Security["Security Analysis"] + + Tool --> ReadOnly + Tool --> Destructive + Tool --> Progress + + ReadOnly --> Security + Destructive --> Security + + style Destructive fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Security fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Annotation Types:** + +| Annotation | Meaning | Use Case | +|------------|---------|----------| +| **readOnlyHint** | Tool doesn't modify state | Lower security risk | +| **destructiveHint** | Tool modifies/deletes data | Require confirmation | +| **progressEnabled** | Tool reports progress | Show progress UI | + +These annotations feed into the security analyzer for risk assessment. + +## Component Relationships + +### How MCP Integrates + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + MCP["MCP System"] + Skills["Skills"] + Tools["Tool Registry"] + Agent["Agent"] + Security["Security"] + + Skills -->|Configures| MCP + MCP -->|Registers| Tools + Agent -->|Uses| Tools + MCP -->|Provides hints| Security + + style MCP fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Skills fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Skills → MCP**: Repository skills can embed MCP configurations +- **MCP → Tools**: MCP tools registered alongside native tools +- **Agent → Tools**: Agents use MCP tools like any other tool +- **MCP → Security**: Annotations inform security risk assessment +- **Transparent Integration**: Agent doesn't distinguish MCP from native tools + +## Design Rationale + +**Async Bridge Pattern:** MCP protocol requires async, but synchronous tool execution simplifies agent implementation. Background event loop bridges the gap without exposing async complexity to tool users. + +**Dynamic Model Generation:** Creating Pydantic models at runtime from MCP schemas enables type-safe tool calls without manual model definitions. This supports arbitrary MCP servers without SDK code changes. + +**Unified Tool Interface:** Wrapping MCP tools in `ToolDefinition` makes them indistinguishable from native tools. Agents use the same interface regardless of tool source. + +**FastMCP Foundation:** Building on FastMCP (MCP SDK for Python) provides battle-tested client implementation, protocol compliance, and ongoing updates as MCP evolves. + +**Annotation Support:** Exposing MCP hints (readOnly, destructive) enables intelligent security analysis and user confirmation flows based on tool characteristics. + +**Lifecycle Management:** Automatic spawn/cleanup of MCP servers in conversation lifecycle ensures resources are properly managed without manual bookkeeping. + +## See Also + +- **[Tool System](/sdk/arch/tool-system)** - How MCP tools integrate with tool framework +- **[Skill Architecture](/sdk/arch/skill)** - Embedding MCP configs in repository skills +- **[Security](/sdk/arch/security)** - How MCP annotations inform risk assessment +- **[MCP Guide](/sdk/guides/mcp)** - Using MCP tools in applications +- **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library + +### Overview +Source: https://docs.openhands.dev/sdk/arch/overview.md + +The **OpenHands Software Agent SDK** provides a unified, type-safe framework for building and deploying AI agents—from local experiments to full production systems, focused on **statelessness**, **composability**, and **clear boundaries** between research and deployment. + +Check [this document](/sdk/arch/design) for the core design principles that guided its architecture. + +## Relationship with OpenHands Applications + +The Software Agent SDK serves as the **source of truth for agents** in OpenHands. The [OpenHands repository](https://github.com/OpenHands/OpenHands) provides interfaces—web app, CLI, and cloud—that consume the SDK APIs. This architecture ensures consistency and enables flexible integration patterns. +- **Software Agent SDK = foundation.** The SDK defines all core components: agents, LLMs, conversations, tools, workspaces, events, and security policies. +- **Interfaces reuse SDK objects.** The OpenHands GUI or CLI hydrate SDK components from persisted settings and orchestrate execution through SDK APIs. +- **Consistent configuration.** Whether you launch an agent programmatically or via the OpenHands GUI, the supported parameters and defaults come from the SDK. + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 50}} }%% +graph TB + subgraph Interfaces["OpenHands Interfaces"] + UI[OpenHands GUI
React frontend] + CLI[OpenHands CLI
Command-line interface] + Custom[Your Custom Client
Automations & workflows] + end + + SDK[Software Agent SDK
openhands.sdk + tools + workspace] + + subgraph External["External Services"] + LLM[LLM Providers
OpenAI, Anthropic, etc.] + Runtime[Runtime Services
Docker, Remote API, etc.] + end + + UI --> SDK + CLI --> SDK + Custom --> SDK + + SDK --> LLM + SDK --> Runtime + + classDef interface fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef sdk fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef external fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class UI,CLI,Custom interface + class SDK sdk + class LLM,Runtime external +``` + + +## Four-Package Architecture + +The agent-sdk is organized into four distinct Python packages: + +| Package | What It Does | When You Need It | +|---------|-------------|------------------| +| **openhands.sdk** | Core agent framework + base workspace classes | Always (required) | +| **openhands.tools** | Pre-built tools (bash, file editing, etc.) | Optional - provides common tools | +| **openhands.workspace** | Extended workspace implementations (Docker, remote) | Optional - extends SDK's base classes | +| **openhands.agent_server** | Multi-user API server | Optional - used by workspace implementations | + +### Two Deployment Modes + +The SDK supports two deployment architectures depending on your needs: + +#### Mode 1: Local Development + +**Installation:** Just install `openhands-sdk` + `openhands-tools` + +```bash +pip install openhands-sdk openhands-tools +``` + +**Architecture:** + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + SDK["openhands.sdk
Agent · LLM · Conversation
+ LocalWorkspace"]:::sdk + Tools["openhands.tools
BashTool · FileEditor · GrepTool · …"]:::tools + + SDK -->|uses| Tools + + classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:2px,rx:8,ry:8 + classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:2px,rx:8,ry:8 +``` + +- `LocalWorkspace` included in SDK (no extra install) +- Everything runs in one process +- Perfect for prototyping and simple use cases +- Quick setup, no Docker required + +#### Mode 2: Production / Sandboxed + +**Installation:** Install all 4 packages + +```bash +pip install openhands-sdk openhands-tools openhands-workspace openhands-agent-server +``` + +**Architecture:** + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 30}} }%% +flowchart LR + + WSBase["openhands.sdk
Base Classes:
Workspace · Local · Remote"]:::sdk + + subgraph WS[" "] + direction LR + Docker["openhands.workspace DockerWorkspace
extends RemoteWorkspace"]:::ws + Remote["openhands.workspace RemoteAPIWorkspace
extends RemoteWorkspace"]:::ws + end + + Server["openhands.agent_server
FastAPI + WebSocket"]:::server + Agent["openhands.sdk
Agent · LLM · Conversation"]:::sdk + Tools["openhands.tools
BashTool · FileEditor · …"]:::tools + + WSBase -.->|extended by| Docker + WSBase -.->|extended by| Remote + Docker -->|spawns container with| Server + Remote -->|connects via HTTP to| Server + Server -->|runs| Agent + Agent -->|uses| Tools + + classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:1.1px,rx:8,ry:8 + classDef ws fill:#fff4df,stroke:#b7791f,color:#5b3410,stroke-width:1.1px,rx:8,ry:8 + classDef server fill:#f3e8ff,stroke:#7c3aed,color:#3b2370,stroke-width:1.1px,rx:8,ry:8 + classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:1.1px,rx:8,ry:8 + + style WS stroke:#b7791f,stroke-width:1.5px,stroke-dasharray: 4 3,rx:8,ry:8,fill:none +``` + +- `RemoteWorkspace` auto-spawns agent-server in containers +- Sandboxed execution for security +- Multi-user deployments +- Distributed systems (e.g., Kubernetes) support + + +**Key Point:** Same agent code works in both modes—just swap the workspace type (`LocalWorkspace` → `DockerWorkspace` → `RemoteAPIWorkspace`). + + +### SDK Package (`openhands.sdk`) + +**Purpose:** Core components and base classes for OpenHands agent. + +**Key Components:** +- **[Agent](/sdk/arch/agent):** Implements the reasoning-action loop +- **[Conversation](/sdk/arch/conversation):** Manages conversation state and lifecycle +- **[LLM](/sdk/arch/llm):** Provider-agnostic language model interface with retry and telemetry +- **[Tool System](/sdk/arch/tool-system):** Typed base class definitions for action, observation, tool, and executor; includes MCP integration +- **[Events](/sdk/arch/events):** Typed event framework (e.g., action, observation, user messages, state update, etc.) +- **[Workspace](/sdk/arch/workspace):** Base classes (`Workspace`, `LocalWorkspace`, `RemoteWorkspace`) +- **[Skill](/sdk/arch/skill):** Reusable user-defined prompts with trigger-based activation +- **[Condenser](/sdk/arch/condenser):** Conversation history compression for token management +- **[Security](/sdk/arch/security):** Action risk assessment and validation before execution + +**Design:** Stateless, immutable components with type-safe Pydantic models. + +**Self-Contained:** Build and run agents with just `openhands-sdk` using `LocalWorkspace`. + +**Source:** [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) + +### Tools Package (`openhands.tools`) + + + +**Tool Independence:** Tools run alongside the agent in whatever environment workspace configures (local/container/remote). They don't run "through" workspace APIs. + + +**Purpose:** Pre-built tools following consistent patterns. + +**Design:** All tools follow Action/Observation/Executor pattern with built-in validation, error handling, and security. + + +For full list of tools, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) as the source of truth. + + + +### Workspace Package (`openhands.workspace`) + +**Purpose:** Workspace implementations extending SDK base classes. + +**Key Components:** Docker Workspace, Remote API Workspace, and more. + +**Design:** All workspace implementations extend `RemoteWorkspace` from SDK, adding container lifecycle or API client functionality. + +**Use Cases:** Sandboxed execution, multi-user deployments, production environments. + + +For full list of implemented workspaces, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace). + + +### Agent Server Package (`openhands.agent_server`) + +**Purpose:** FastAPI-based HTTP/WebSocket server for remote agent execution. + +**Features:** +- REST API & WebSocket endpoints for conversations, bash, files, events, desktop, and VSCode +- Service management with isolated per-user sessions +- API key authentication and health checking + +**Deployment:** Runs inside containers (via `DockerWorkspace`) or as standalone process (connected via `RemoteWorkspace`). + +**Use Cases:** Multi-user web apps, SaaS products, distributed systems. + + +For implementation details, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server). + + +## How Components Work Together + +### Basic Execution Flow (Local) + +When you send a message to an agent, here's what happens: + +```mermaid +sequenceDiagram + participant You + participant Conversation + participant Agent + participant LLM + participant Tool + + You->>Conversation: "Create hello.txt" + Conversation->>Agent: Process message + Agent->>LLM: What should I do? + LLM-->>Agent: Use BashTool("touch hello.txt") + Agent->>Tool: Execute action + Note over Tool: Runs in same environment
as Agent (local/container/remote) + Tool-->>Agent: Observation + Agent->>LLM: Got result, continue? + LLM-->>Agent: Done + Agent-->>Conversation: Update state + Conversation-->>You: "File created!" +``` + +**Key takeaway:** The agent orchestrates the reasoning-action loop—calling the LLM for decisions and executing tools to perform actions. + +### Deployment Flexibility + +The same agent code runs in different environments by swapping workspace configuration: + +```mermaid +graph TB + subgraph "Your Code (Unchanged)" + Code["Agent + Tools + LLM"] + end + + subgraph "Deployment Options" + Local["Local
Direct execution"] + Docker["Docker
Containerized"] + Remote["Remote
Multi-user server"] + end + + Code -->|LocalWorkspace| Local + Code -->|DockerWorkspace| Docker + Code -->|RemoteAPIWorkspace| Remote + + style Code fill:#e1f5fe + style Local fill:#e8f5e8 + style Docker fill:#e8f5e8 + style Remote fill:#e8f5e8 +``` + +## Next Steps + +### Get Started +- [Getting Started](/sdk/getting-started) – Build your first agent +- [Hello World](/sdk/guides/hello-world) – Minimal example + +### Explore Components + +**SDK Package:** +- [Agent](/sdk/arch/agent) – Core reasoning-action loop +- [Conversation](/sdk/arch/conversation) – State management and lifecycle +- [LLM](/sdk/arch/llm) – Language model integration +- [Tool System](/sdk/arch/tool-system) – Action/Observation/Executor pattern +- [Events](/sdk/arch/events) – Typed event framework +- [Workspace](/sdk/arch/workspace) – Base workspace architecture + +**Tools Package:** +- See [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) source code for implementation details + +**Workspace Package:** +- See [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) source code for implementation details + +**Agent Server:** +- See [`openhands-agent-server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server) source code for implementation details + +### Deploy +- [Remote Server](/sdk/guides/agent-server/overview) – Deploy remotely +- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) – Container setup +- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) – Hosted runtime service +- [Local Agent Server](/sdk/guides/agent-server/local-server) – In-process server + +### Source Code +- [`openhands/sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) – Core framework +- [`openhands/tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) – Pre-built tools +- [`openhands/workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace) – Workspaces +- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) – HTTP server +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) – Working examples + +### SDK Package +Source: https://docs.openhands.dev/sdk/arch/sdk.md + +The SDK package (`openhands.sdk`) is the heart of the OpenHands Software Agent SDK. It provides the core framework for building agents locally or embedding them in applications. + +**Source**: [`sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) + +## Purpose + +The SDK package handles: +- **Agent reasoning loop**: How agents process messages and make decisions +- **State management**: Conversation lifecycle and persistence +- **LLM integration**: Provider-agnostic language model access +- **Tool system**: Typed actions and observations +- **Workspace abstraction**: Where code executes +- **Extensibility**: Skills, condensers, MCP, security + +## Core Components + +```mermaid +graph TB + Conv[Conversation
Lifecycle Manager] --> Agent[Agent
Reasoning Loop] + + Agent --> LLM[LLM
Language Model] + Agent --> Tools[Tool System
Capabilities] + Agent --> Micro[Skills
Behavior Modules] + Agent --> Cond[Condenser
Memory Manager] + + Tools --> Workspace[Workspace
Execution] + + Conv --> Events[Events
Communication] + Tools --> MCP[MCP
External Tools] + Workspace --> Security[Security
Validation] + + style Conv fill:#e1f5fe + style Agent fill:#f3e5f5 + style LLM fill:#e8f5e8 + style Tools fill:#fff3e0 + style Workspace fill:#fce4ec +``` + +### 1. Conversation - State & Lifecycle + +**What it does**: Manages the entire conversation lifecycle and state. + +**Key responsibilities**: +- Maintains conversation state (immutable) +- Handles message flow between user and agent +- Manages turn-taking and async execution +- Persists and restores conversation state +- Emits events for monitoring + +**Design decisions**: +- **Immutable state**: Each operation returns a new Conversation instance +- **Serializable**: Can be saved to disk or database and restored +- **Async-first**: Built for streaming and concurrent execution + +**When to use directly**: When you need fine-grained control over conversation state, want to implement custom persistence, or need to pause/resume conversations. + +**Example use cases**: +- Saving conversation to database after each turn +- Implementing undo/redo functionality +- Building multi-session chatbots +- Time-travel debugging + +**Learn more**: +- Guide: [Conversation Persistence](/sdk/guides/convo-persistence) +- Guide: [Pause and Resume](/sdk/guides/convo-pause-and-resume) +- Source: [`conversation/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation) + +--- + +### 2. Agent - The Reasoning Loop + +**What it does**: The core reasoning engine that processes messages and decides what to do. + +**Key responsibilities**: +- Receives messages and current state +- Consults LLM to reason about next action +- Validates and executes tool calls +- Processes observations and loops until completion +- Integrates with skills for specialized behavior + +**Design decisions**: +- **Stateless**: Agent doesn't hold state, operates on Conversation +- **Extensible**: Behavior can be modified via skills +- **Provider-agnostic**: Works with any LLM through unified interface + +**The reasoning loop**: +1. Receive message from Conversation +2. Add message to context +3. Consult LLM with full conversation history +4. If LLM returns tool call → validate and execute tool +5. If tool returns observation → add to context, go to step 3 +6. If LLM returns response → done, return to user + +**When to customize**: When you need specialized reasoning strategies, want to implement custom agent behaviors, or need to control the execution flow. + +**Example use cases**: +- Planning agents that break tasks into steps +- Code review agents with specific checks +- Agents with domain-specific reasoning patterns + +**Learn more**: +- Guide: [Custom Agents](/sdk/guides/agent-custom) +- Guide: [Agent Stuck Detector](/sdk/guides/agent-stuck-detector) +- Source: [`agent/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent) + +--- + +### 3. LLM - Language Model Integration + +**What it does**: Provides a provider-agnostic interface to language models. + +**Key responsibilities**: +- Abstracts different LLM providers (OpenAI, Anthropic, etc.) +- Handles message formatting and conversion +- Manages streaming responses +- Supports tool calling and reasoning modes +- Handles retries and error recovery + +**Design decisions**: +- **Provider-agnostic**: Same API works with any provider +- **Streaming-first**: Built for real-time responses +- **Type-safe**: Pydantic models for all messages +- **Extensible**: Easy to add new providers + +**Why provider-agnostic?** You can switch between OpenAI, Anthropic, local models, etc. without changing your agent code. This is crucial for: +- Cost optimization (switch to cheaper models) +- Testing with different models +- Avoiding vendor lock-in +- Supporting customer choice + +**When to customize**: When you need to add a new LLM provider, implement custom retries, or modify message formatting. + +**Example use cases**: +- Routing requests to different models based on complexity +- Implementing custom caching strategies +- Adding observability hooks + +**Learn more**: +- Guide: [LLM Registry](/sdk/guides/llm-registry) +- Guide: [LLM Routing](/sdk/guides/llm-routing) +- Guide: [Reasoning and Tool Use](/sdk/guides/llm-reasoning) +- Source: [`llm/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm) + +--- + +### 4. Tool System - Typed Capabilities + +**What it does**: Defines what agents can do through a typed action/observation pattern. + +**Key responsibilities**: +- Defines tool schemas (inputs and outputs) +- Validates actions before execution +- Executes tools and returns typed observations +- Generates JSON schemas for LLM tool calling +- Registers tools with the agent + +**Design decisions**: +- **Action/Observation pattern**: Tools are defined as type-safe input/output pairs +- **Schema generation**: Pydantic models auto-generate JSON schemas +- **Executor pattern**: Separation of tool definition and execution +- **Composable**: Tools can call other tools + +**The three components**: +1. **Action**: Input schema (what the tool accepts) +2. **Observation**: Output schema (what the tool returns) +3. **ToolExecutor**: Logic that transforms Action → Observation + +**Why this pattern?** +- Type safety catches errors early +- LLMs get accurate schemas for tool calling +- Tools are testable in isolation +- Easy to compose tools + +**When to customize**: When you need domain-specific capabilities not covered by built-in tools. + +**Example use cases**: +- Database query tools +- API integration tools +- Custom file format parsers +- Domain-specific calculators + +**Learn more**: +- Guide: [Custom Tools](/sdk/guides/custom-tools) +- Source: [`tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) + +--- + +### 5. Workspace - Execution Abstraction + +**What it does**: Abstracts *where* code executes (local, Docker, remote). + +**Key responsibilities**: +- Provides unified interface for code execution +- Handles file operations across environments +- Manages working directories +- Supports different isolation levels + +**Design decisions**: +- **Abstract interface**: LocalWorkspace in SDK, advanced types in workspace package +- **Environment-agnostic**: Code works the same locally or remotely +- **Lazy initialization**: Workspace setup happens on first use + +**Why abstract?** You can develop locally with LocalWorkspace, then deploy with DockerWorkspace or RemoteAPIWorkspace without changing agent code. + +**When to use directly**: Rarely - usually configured when creating an agent. Use advanced workspaces for production. + +**Learn more**: +- Architecture: [Workspace Architecture](/sdk/arch/workspace) +- Guides: [Remote Agent Server](/sdk/guides/agent-server/overview) +- Source: [`workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/workspace) + +--- + +### 6. Events - Component Communication + +**What it does**: Enables observability and debugging through event emissions. + +**Key responsibilities**: +- Defines event types (messages, actions, observations, errors) +- Emitted by Conversation, Agent, Tools +- Enables logging, debugging, and monitoring +- Supports custom event handlers + +**Design decisions**: +- **Immutable**: Events are snapshots, not mutable objects +- **Serializable**: Can be logged, stored, replayed +- **Type-safe**: Pydantic models for all events + +**Why events?** They provide a timeline of what happened during agent execution. Essential for: +- Debugging agent behavior +- Understanding decision-making +- Building observability dashboards +- Implementing custom logging + +**When to use**: When building monitoring systems, debugging tools, or need to track agent behavior. + +**Learn more**: +- Guide: [Metrics and Observability](/sdk/guides/metrics) +- Source: [`event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) + +--- + +### 7. Condenser - Memory Management + +**What it does**: Compresses conversation history when it gets too long. + +**Key responsibilities**: +- Monitors conversation length +- Summarizes older messages +- Preserves important context +- Keeps conversation within token limits + +**Design decisions**: +- **Pluggable**: Different condensing strategies +- **Automatic**: Triggered when context gets large +- **Preserves semantics**: Important information retained + +**Why needed?** LLMs have token limits. Long conversations would eventually exceed context windows. Condensers keep conversations running indefinitely while staying within limits. + +**When to customize**: When you need domain-specific summarization strategies or want to control what gets preserved. + +**Example strategies**: +- Summarize old messages +- Keep only last N turns +- Preserve task-related messages + +**Learn more**: +- Guide: [Context Condenser](/sdk/guides/context-condenser) +- Source: [`condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) + +--- + +### 8. MCP - Model Context Protocol + +**What it does**: Integrates external tool servers via Model Context Protocol. + +**Key responsibilities**: +- Connects to MCP-compatible tool servers +- Translates MCP tools to SDK tool format +- Manages server lifecycle +- Handles server communication + +**Design decisions**: +- **Standard protocol**: Uses MCP specification +- **Transparent integration**: MCP tools look like regular tools to agents +- **Process management**: Handles server startup/shutdown + +**Why MCP?** It lets you use external tools without writing custom SDK integrations. Many tools (databases, APIs, services) provide MCP servers. + +**When to use**: When you need tools that: +- Already have MCP servers (fetch, filesystem, etc.) +- Are too complex to rewrite as SDK tools +- Need to run in separate processes +- Are provided by third parties + +**Learn more**: +- Guide: [MCP Integration](/sdk/guides/mcp) +- Spec: [Model Context Protocol](https://modelcontextprotocol.io/) +- Source: [`mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) + +--- + +### 9. Skills (formerly Microagents) - Behavior Modules + +**What it does**: Specialized modules that modify agent behavior for specific tasks. + +**Key responsibilities**: +- Provide domain-specific instructions +- Modify system prompts +- Guide agent decision-making +- Compose to create specialized agents + +**Design decisions**: +- **Composable**: Multiple skills can work together +- **Declarative**: Defined as configuration, not code +- **Reusable**: Share skills across agents + +**Why skills?** Instead of hard-coding behaviors, skills let you compose agent personalities and capabilities. Like "plugins" for agent behavior. + +**Example skills**: +- GitHub operations (issue creation, PRs) +- Code review guidelines +- Documentation style enforcement +- Project-specific conventions + +**When to use**: When you need agents with specialized knowledge or behavior patterns that apply to specific domains or tasks. + +**Learn more**: +- Guide: [Agent Skills & Context](/sdk/guides/skill) +- Source: [`skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) + +--- + +### 10. Security - Validation & Sandboxing + +**What it does**: Validates inputs and enforces security constraints. + +**Key responsibilities**: +- Input validation +- Command sanitization +- Path traversal prevention +- Resource limits + +**Design decisions**: +- **Defense in depth**: Multiple validation layers +- **Fail-safe**: Rejects suspicious inputs by default +- **Configurable**: Adjust security levels as needed + +**Why needed?** Agents execute arbitrary code and file operations. Security prevents: +- Malicious prompts escaping sandboxes +- Path traversal attacks +- Resource exhaustion +- Unintended system access + +**When to customize**: When you need domain-specific validation rules or want to adjust security policies. + +**Learn more**: +- Guide: [Security and Secrets](/sdk/guides/security) +- Source: [`security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) + +--- + +## How Components Work Together + +### Example: User asks agent to create a file + +``` +1. User → Conversation: "Create a file called hello.txt with 'Hello World'" + +2. Conversation → Agent: New message event + +3. Agent → LLM: Full conversation history + available tools + +4. LLM → Agent: Tool call for FileEditorTool.create() + +5. Agent → Tool System: Validate FileEditorAction + +6. Tool System → Tool Executor: Execute action + +7. Tool Executor → Workspace: Create file (local/docker/remote) + +8. Workspace → Tool Executor: Success + +9. Tool Executor → Tool System: FileEditorObservation (success=true) + +10. Tool System → Agent: Observation + +11. Agent → LLM: Updated history with observation + +12. LLM → Agent: "File created successfully" + +13. Agent → Conversation: Done, final response + +14. Conversation → User: "File created successfully" +``` + +Throughout this flow: +- **Events** are emitted for observability +- **Condenser** may trigger if history gets long +- **Skills** influence LLM's decision-making +- **Security** validates file paths and operations +- **MCP** could provide additional tools if configured + +## Design Patterns + +### Immutability + +All core objects are immutable. Operations return new instances: + +```python +conversation = Conversation(...) +new_conversation = conversation.add_message(message) +# conversation is unchanged, new_conversation has the message +``` + +**Why?** Makes debugging easier, enables time-travel, ensures serializability. + +### Composition Over Inheritance + +Agents are composed from: +- LLM provider +- Tool list +- Skill list +- Condenser strategy +- Security policy + +You don't subclass Agent - you configure it. + +**Why?** More flexible, easier to test, enables runtime configuration. + +### Type Safety + +Everything uses Pydantic models: +- Messages, actions, observations are typed +- Validation happens automatically +- Schemas generate from types + +**Why?** Catches errors early, provides IDE support, self-documenting. + +## Next Steps + +### For Usage Examples + +- [Getting Started](/sdk/getting-started) - Build your first agent +- [Custom Tools](/sdk/guides/custom-tools) - Extend capabilities +- [LLM Configuration](/sdk/guides/llm-registry) - Configure providers +- [Conversation Management](/sdk/guides/convo-persistence) - State handling + +### For Related Architecture + +- [Tool System](/sdk/arch/tool-system) - Built-in tool implementations +- [Workspace Architecture](/sdk/arch/workspace) - Execution environments +- [Agent Server Architecture](/sdk/arch/agent-server) - Remote execution + +### For Implementation Details + +- [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) - SDK source code +- [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) - Tools source code +- [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) - Workspace source code +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples + +### Security +Source: https://docs.openhands.dev/sdk/arch/security.md + +The **Security** system evaluates agent actions for potential risks before execution. It provides pluggable security analyzers that assess action risk levels and enforce confirmation policies based on security characteristics. + +**Source:** [`openhands-sdk/penhands/sdk/security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) + +## Core Responsibilities + +The Security system has four primary responsibilities: + +1. **Risk Assessment** - Capture and validate LLM-provided risk levels for actions +2. **Confirmation Policy** - Determine when user approval is required based on risk +3. **Action Validation** - Enforce security policies before execution +4. **Audit Trail** - Record security decisions in event history + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Interface["Abstract Interface"] + Base["SecurityAnalyzerBase
Abstract analyzer"] + end + + subgraph Implementations["Concrete Analyzers"] + LLM["LLMSecurityAnalyzer
Inline risk prediction"] + NoOp["NoOpSecurityAnalyzer
No analysis"] + end + + subgraph Risk["Risk Levels"] + Low["LOW
Safe operations"] + Medium["MEDIUM
Moderate risk"] + High["HIGH
Dangerous ops"] + Unknown["UNKNOWN
Unanalyzed"] + end + + subgraph Policy["Confirmation Policy"] + Check["should_require_confirmation()"] + Mode["Confirmation Mode"] + Decision["Require / Allow"] + end + + Base --> LLM + Base --> NoOp + + Implementations --> Low + Implementations --> Medium + Implementations --> High + Implementations --> Unknown + + Low --> Check + Medium --> Check + High --> Check + Unknown --> Check + + Check --> Mode + Mode --> Decision + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + classDef danger fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + + class Base primary + class LLM secondary + class High danger + class Check tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`SecurityAnalyzerBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Abstract interface | Defines `security_risk()` contract | +| **[`LLMSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/llm_analyzer.py)** | Inline risk assessment | Returns LLM-provided risk from action arguments | +| **[`NoOpSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Passthrough analyzer | Always returns UNKNOWN | +| **[`SecurityRisk`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/risk.py)** | Risk enum | LOW, MEDIUM, HIGH, UNKNOWN | +| **[`ConfirmationPolicy`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py)** | Decision logic | Maps risk levels to confirmation requirements | + +## Risk Levels + +Security analyzers return one of four risk levels: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + Action["ActionEvent"] + Analyze["Security Analyzer"] + + subgraph Levels["Risk Levels"] + Low["LOW
Read-only, safe"] + Medium["MEDIUM
Modify files"] + High["HIGH
Delete, execute"] + Unknown["UNKNOWN
Not analyzed"] + end + + Action --> Analyze + Analyze --> Low + Analyze --> Medium + Analyze --> High + Analyze --> Unknown + + style Low fill:#d1fae5,stroke:#10b981,stroke-width:2px + style Medium fill:#fef3c7,stroke:#f59e0b,stroke-width:2px + style High fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + style Unknown fill:#f3f4f6,stroke:#6b7280,stroke-width:2px +``` + +### Risk Level Definitions + +| Level | Characteristics | Examples | +|-------|----------------|----------| +| **LOW** | Read-only, no state changes | File reading, directory listing, search | +| **MEDIUM** | Modifies user data | File editing, creating files, API calls | +| **HIGH** | Dangerous operations | File deletion, system commands, privilege escalation | +| **UNKNOWN** | Not analyzed or indeterminate | Complex commands, ambiguous operations | + +## Security Analyzers + +### LLMSecurityAnalyzer + +Leverages the LLM's inline risk assessment during action generation: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Schema["Tool Schema
+ security_risk param"] + LLM["LLM generates action
with security_risk"] + ToolCall["Tool Call Arguments
{command: 'rm -rf', security_risk: 'HIGH'}"] + Extract["Extract security_risk
from arguments"] + ActionEvent["ActionEvent
with security_risk set"] + Analyzer["LLMSecurityAnalyzer
returns security_risk"] + + Schema --> LLM + LLM --> ToolCall + ToolCall --> Extract + Extract --> ActionEvent + ActionEvent --> Analyzer + + style Schema fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Extract fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Analyzer fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Analysis Process:** + +1. **Schema Enhancement:** A required `security_risk` parameter is added to each tool's schema +2. **LLM Generation:** The LLM generates tool calls with `security_risk` as part of the arguments +3. **Risk Extraction:** The agent extracts the `security_risk` value from the tool call arguments +4. **ActionEvent Creation:** The security risk is stored on the `ActionEvent` +5. **Analyzer Query:** `LLMSecurityAnalyzer.security_risk()` returns the pre-assigned risk level +6. **No Additional LLM Calls:** Risk assessment happens inline—no separate analysis step + +**Example Tool Call:** +```json +{ + "name": "execute_bash", + "arguments": { + "command": "rm -rf /tmp/cache", + "security_risk": "HIGH" + } +} +``` + +The LLM reasons about risk in context when generating the action, eliminating the need for a separate security analysis call. + +**Configuration:** +- **Enabled When:** A `LLMSecurityAnalyzer` is configured for the agent +- **Schema Modification:** Automatically adds `security_risk` field to non-read-only tools +- **Zero Overhead:** No additional LLM calls or latency beyond normal action generation + +### NoOpSecurityAnalyzer + +Passthrough analyzer that skips analysis: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Action["ActionEvent"] + NoOp["NoOpSecurityAnalyzer"] + Unknown["SecurityRisk.UNKNOWN"] + + Action --> NoOp --> Unknown + + style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Use Case:** Development, trusted environments, or when confirmation mode handles all actions + +## Confirmation Policy + +The confirmation policy determines when user approval is required. There are three policy implementations: + +**Source:** [`confirmation_policy.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py) + +### Policy Types + +| Policy | Behavior | Use Case | +|--------|----------|----------| +| **[`AlwaysConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L27-L32)** | Requires confirmation for **all** actions | Maximum safety, interactive workflows | +| **[`NeverConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L35-L40)** | Never requires confirmation | Fully autonomous agents, trusted environments | +| **[`ConfirmRisky`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L43-L62)** | Configurable risk-based policy | Balanced approach, production use | + +### ConfirmRisky (Default Policy) + +The most flexible policy with configurable thresholds: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Risk["SecurityRisk"] + CheckUnknown{"Risk ==
UNKNOWN?"} + UseConfirmUnknown{"confirm_unknown
setting?"} + CheckThreshold{"risk.is_riskier
(threshold)?"} + + Confirm["Require Confirmation"] + Allow["Allow Execution"] + + Risk --> CheckUnknown + CheckUnknown -->|Yes| UseConfirmUnknown + CheckUnknown -->|No| CheckThreshold + + UseConfirmUnknown -->|True| Confirm + UseConfirmUnknown -->|False| Allow + + CheckThreshold -->|Yes| Confirm + CheckThreshold -->|No| Allow + + style CheckUnknown fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Confirm fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + style Allow fill:#d1fae5,stroke:#10b981,stroke-width:2px +``` + +**Configuration:** +- **`threshold`** (default: `HIGH`) - Risk level at or above which confirmation is required + - Cannot be set to `UNKNOWN` + - Uses reflexive comparison: `risk.is_riskier(threshold)` returns `True` if `risk >= threshold` +- **`confirm_unknown`** (default: `True`) - Whether `UNKNOWN` risk requires confirmation + +### Confirmation Rules by Policy + +#### ConfirmRisky with threshold=HIGH (Default) + +| Risk Level | `confirm_unknown=True` (default) | `confirm_unknown=False` | +|------------|----------------------------------|-------------------------| +| **LOW** | ✅ Allow | ✅ Allow | +| **MEDIUM** | ✅ Allow | ✅ Allow | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | + +#### ConfirmRisky with threshold=MEDIUM + +| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | +|------------|------------------------|-------------------------| +| **LOW** | ✅ Allow | ✅ Allow | +| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | + +#### ConfirmRisky with threshold=LOW + +| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | +|------------|------------------------|-------------------------| +| **LOW** | 🔒 Require confirmation | 🔒 Require confirmation | +| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | + +**Key Rules:** +- **Risk comparison** is **reflexive**: `HIGH.is_riskier(HIGH)` returns `True` +- **UNKNOWN handling** is configurable via `confirm_unknown` flag +- **Threshold cannot be UNKNOWN** - validated at policy creation time + + +## Component Relationships + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Security["Security Analyzer"] + Agent["Agent"] + Conversation["Conversation"] + Tools["Tools"] + MCP["MCP Tools"] + + Agent -->|Validates actions| Security + Security -->|Checks| Tools + Security -->|Uses hints| MCP + Conversation -->|Pauses for confirmation| Agent + + style Security fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Agent → Security**: Validates actions before execution +- **Security → Tools**: Examines tool characteristics (annotations) +- **Security → MCP**: Uses MCP hints for risk assessment +- **Conversation → Agent**: Pauses for user confirmation when required +- **Optional Component**: Security analyzer can be disabled for trusted environments + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents use security analyzers +- **[Tool System](/sdk/arch/tool-system)** - Tool annotations and metadata; includes MCP tool hints +- **[Security Guide](/sdk/guides/security)** - Configuring security policies + +### Skill +Source: https://docs.openhands.dev/sdk/arch/skill.md + +The **Skill** system provides a mechanism for injecting reusable, specialized knowledge into agent context. Skills use trigger-based activation to determine when they should be included in the agent's prompt. + +**Source:** [`openhands/sdk/context/skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) + +## Core Responsibilities + +The Skill system has four primary responsibilities: + +1. **Context Injection** - Add specialized prompts to agent context based on triggers +2. **Trigger Evaluation** - Determine when skills should activate (always, keyword, task) +3. **MCP Integration** - Load MCP tools associated with repository skills +4. **Third-Party Support** - Parse `.cursorrules`, `agents.md`, and other skill formats + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +flowchart TB + subgraph Types["Skill Types"] + Repo["Repository Skill
trigger: None"] + Knowledge["Knowledge Skill
trigger: KeywordTrigger"] + Task["Task Skill
trigger: TaskTrigger"] + end + + subgraph Triggers["Trigger Evaluation"] + Always["Always Active
Repository guidelines"] + Keyword["Keyword Match
String matching on user messages"] + TaskMatch["Keyword Match + Inputs
Same as KeywordTrigger + user inputs"] + end + + subgraph Content["Skill Content"] + Markdown["Markdown with Frontmatter"] + MCPTools["MCP Tools Config
Repo skills only"] + Inputs["Input Metadata
Task skills only"] + end + + subgraph Integration["Agent Integration"] + Context["Agent Context"] + Prompt["System Prompt"] + end + + Repo --> Always + Knowledge --> Keyword + Task --> TaskMatch + + Always --> Markdown + Keyword --> Markdown + TaskMatch --> Markdown + + Repo -.->|Optional| MCPTools + Task -.->|Requires| Inputs + + Markdown --> Context + MCPTools --> Context + Context --> Prompt + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Repo,Knowledge,Task primary + class Always,Keyword,TaskMatch secondary + class Context tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Skill`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/skill.py)** | Core skill model | Pydantic model with name, content, trigger | +| **[`KeywordTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Keyword-based activation | String matching on user messages | +| **[`TaskTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Task-based activation | Special type of KeywordTrigger for skills with user inputs | +| **[`InputMetadata`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/types.py)** | Task input parameters | Defines user inputs for task skills | +| **Skill Loader** | File parsing | Reads markdown with frontmatter, validates schema | + +## Skill Types + +### Repository Skills + +Always-active, repository-specific guidelines. + +**Recommended:** put these permanent instructions in `AGENTS.md` (and optionally `GEMINI.md` / `CLAUDE.md`) at the repo root. + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + File["AGENTS.md"] + Parse["Parse Frontmatter"] + Skill["Skill(trigger=None)"] + Context["Always in Context"] + + File --> Parse + Parse --> Skill + Skill --> Context + + style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Context fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Characteristics:** +- **Trigger:** `None` (always active) +- **Purpose:** Project conventions, coding standards, architecture rules +- **MCP Tools:** Can include MCP tool configuration +- **Location:** `AGENTS.md` (recommended) and/or `.agents/skills/*.md` (supported) + +**Example Files (permanent context):** +- `AGENTS.md` - General agent instructions +- `GEMINI.md` - Gemini-specific instructions +- `CLAUDE.md` - Claude-specific instructions + +**Other supported formats:** +- `.cursorrules` - Cursor IDE guidelines +- `agents.md` / `agent.md` - General agent instructions + +### Knowledge Skills + +Keyword-triggered skills for specialized domains: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + User["User Message"] + Check["Check Keywords"] + Match{"Match?"} + Activate["Activate Skill"] + Skip["Skip Skill"] + Context["Add to Context"] + + User --> Check + Check --> Match + Match -->|Yes| Activate + Match -->|No| Skip + Activate --> Context + + style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Activate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Characteristics:** +- **Trigger:** `KeywordTrigger` with regex patterns +- **Purpose:** Domain-specific knowledge (e.g., "kubernetes", "machine learning") +- **Activation:** Keywords detected in user messages +- **Location:** System or user-defined knowledge base + +**Trigger Example:** +```yaml +--- +name: kubernetes +trigger: + type: keyword + keywords: ["kubernetes", "k8s", "kubectl"] +--- +``` + +### Task Skills + +Keyword-triggered skills with structured inputs for guided workflows: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + User["User Message"] + Match{"Keyword
Match?"} + Inputs["Collect User Inputs"] + Template["Apply Template"] + Context["Add to Context"] + Skip["Skip Skill"] + + User --> Match + Match -->|Yes| Inputs + Match -->|No| Skip + Inputs --> Template + Template --> Context + + style Match fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Template fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Characteristics:** +- **Trigger:** `TaskTrigger` (a special type of KeywordTrigger for skills with user inputs) +- **Activation:** Keywords/triggers detected in user messages (same matching logic as KeywordTrigger) +- **Purpose:** Guided workflows (e.g., bug fixing, feature implementation) +- **Inputs:** User-provided parameters (e.g., bug description, acceptance criteria) +- **Location:** System-defined or custom task templates + +**Trigger Example:** +```yaml +--- +name: bug_fix +triggers: ["/bug_fix", "fix bug", "bug report"] +inputs: + - name: bug_description + description: "Describe the bug" + required: true +--- +``` + +**Note:** TaskTrigger uses the same keyword matching mechanism as KeywordTrigger. The distinction is semantic - TaskTrigger is used for skills that require structured user inputs, while KeywordTrigger is for knowledge-based skills. + +## Trigger Evaluation + +Skills are evaluated at different points in the agent lifecycle: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["Agent Step Start"] + + Repo["Check Repository Skills
trigger: None"] + AddRepo["Always Add to Context"] + + Message["Check User Message"] + Keyword["Match Keyword Triggers"] + AddKeyword["Add Matched Skills"] + + TaskType["Check Task Type"] + TaskMatch["Match Task Triggers"] + AddTask["Add Task Skill"] + + Build["Build Agent Context"] + + Start --> Repo + Repo --> AddRepo + + Start --> Message + Message --> Keyword + Keyword --> AddKeyword + + Start --> TaskType + TaskType --> TaskMatch + TaskMatch --> AddTask + + AddRepo --> Build + AddKeyword --> Build + AddTask --> Build + + style Repo fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Keyword fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style TaskMatch fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Evaluation Rules:** + +| Trigger Type | Evaluation Point | Activation Condition | +|--------------|------------------|----------------------| +| **None** | Every step | Always active | +| **KeywordTrigger** | On user message | Keyword/string match in message | +| **TaskTrigger** | On user message | Keyword/string match in message (same as KeywordTrigger) | + +**Note:** Both KeywordTrigger and TaskTrigger use identical string matching logic. TaskTrigger is simply a semantic variant used for skills that include user input parameters. + +## MCP Tool Integration + +Repository skills can include MCP tool configurations: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Skill["Repository Skill"] + MCPConfig["mcp_tools Config"] + Client["MCP Client"] + Tools["Tool Registry"] + + Skill -->|Contains| MCPConfig + MCPConfig -->|Spawns| Client + Client -->|Registers| Tools + + style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style MCPConfig fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Tools fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**MCP Configuration Format:** + +Skills can embed MCP server configuration following the [FastMCP format](https://gofastmcp.com/clients/client#configuration-format): + +```yaml +--- +name: repo_skill +mcp_tools: + mcpServers: + filesystem: + command: "npx" + args: ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/project"] +--- +``` + +**Workflow:** +1. **Load Skill:** Parse markdown file with frontmatter +2. **Extract MCP Config:** Read `mcp_tools` field +3. **Spawn MCP Servers:** Create MCP clients for each server +4. **Register Tools:** Add MCP tools to agent's tool registry +5. **Inject Context:** Add skill content to agent prompt + +## Skill File Format + +Skills are defined in markdown files with YAML frontmatter: + +```markdown +--- +name: skill_name +trigger: + type: keyword + keywords: ["pattern1", "pattern2"] +--- + +# Skill Content + +This is the instruction text that will be added to the agent's context. +``` + +**Frontmatter Fields:** + +| Field | Required | Description | +|-------|----------|-------------| +| **name** | Yes | Unique skill identifier | +| **trigger** | Yes* | Activation trigger (`null` for always active) | +| **mcp_tools** | No | MCP server configuration (repo skills only) | +| **inputs** | No | User input metadata (task skills only) | + +*Repository skills use `trigger: null` (or omit trigger field) + +## Component Relationships + +### How Skills Integrate + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Skills["Skill System"] + Context["Agent Context"] + Agent["Agent"] + MCP["MCP Client"] + + Skills -->|Injects content| Context + Skills -.->|Spawns tools| MCP + Context -->|System prompt| Agent + MCP -->|Tool| Agent + + style Skills fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Context fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Skills → Agent Context**: Active skills contribute their content to system prompt +- **Skills → MCP**: Repository skills can spawn MCP servers and register tools +- **Context → Agent**: Combined skill content becomes part of agent's instructions +- **Skills Lifecycle**: Loaded at conversation start, evaluated each step + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents use skills for context +- **[Tool System](/sdk/arch/tool-system#mcp-integration)** - MCP tool spawning and client management +- **[Context Management Guide](/sdk/guides/skill)** - Using skills in applications + +### Tool System & MCP +Source: https://docs.openhands.dev/sdk/arch/tool-system.md + +The **Tool System** provides a type-safe, extensible framework for defining agent capabilities. It standardizes how agents interact with external systems through a structured Action-Observation pattern with automatic validation and schema generation. + +**Source:** [`openhands-sdk/openhands/sdk/tool/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/tool) + +## Core Responsibilities + +The Tool System has four primary responsibilities: + +1. **Type Safety** - Enforce action/observation schemas via Pydantic models +2. **Schema Generation** - Auto-generate LLM-compatible tool descriptions from Pydantic schemas +3. **Execution Lifecycle** - Validate inputs, execute logic, wrap outputs +4. **Tool Registry** - Discover and resolve tools by name or pattern + +## Tool System + +### Architecture Overview + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Definition["Tool Definition"] + Action["Action
Input schema"] + Observation["Observation
Output schema"] + Executor["Executor
Business logic"] + end + + subgraph Framework["Tool Framework"] + Base["ToolBase
Abstract base"] + Impl["Tool Implementation
Concrete tool"] + Registry["Tool Registry
Spec → Tool"] + end + + Agent["Agent"] + LLM["LLM"] + ToolSpec["Tool Spec
name + params"] + + Base -.->|Extends| Impl + + ToolSpec -->|resolve_tool| Registry + Registry -->|Create instances| Impl + Impl -->|Available in| Agent + Impl -->|Generate schema| LLM + LLM -->|Generate tool call| Agent + Agent -->|Parse & validate| Action + Agent -->|Execute via Tool.\_\_call\_\_| Executor + Executor -->|Return| Observation + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class Action,Observation,Executor secondary + class Registry tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`ToolBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Abstract base class | Generic over Action and Observation types, defines abstract `create()` | +| **[`ToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Concrete tool class | Can be instantiated directly or subclassed for factory pattern | +| **[`Action`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Input model | Pydantic model with `visualize` property | +| **[`Observation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Output model | Pydantic model with `to_llm_content` property | +| **[`ToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Execution interface | ABC with `__call__()` method, optional `close()` | +| **[`ToolAnnotations`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Behavioral hints | MCP-spec hints (readOnly, destructive, idempotent, openWorld) | +| **[`Tool` (spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** | Tool specification | Configuration object with name and params | +| **[`ToolRegistry`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/registry.py)** | Tool discovery | Resolves Tool specs to ToolDefinition instances | + +### Action-Observation Pattern + +The tool system follows a **strict input-output contract**: `Action → Observation`. The Agent layer wraps these in events for conversation management. + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Agent["Agent Layer"] + ToolCall["MessageToolCall
from LLM"] + ParseJSON["Parse JSON
arguments"] + CreateAction["tool.action_from_arguments()
Pydantic validation"] + WrapAction["ActionEvent
wraps Action"] + WrapObs["ObservationEvent
wraps Observation"] + Error["AgentErrorEvent"] + end + + subgraph ToolSystem["Tool System"] + ActionType["Action
Pydantic model"] + ToolCall2["tool.\_\_call\_\_(action)
type-safe execution"] + Execute["ToolExecutor
business logic"] + ObsType["Observation
Pydantic model"] + end + + ToolCall --> ParseJSON + ParseJSON -->|Valid JSON| CreateAction + ParseJSON -->|Invalid JSON| Error + CreateAction -->|Valid| ActionType + CreateAction -->|Invalid| Error + ActionType --> WrapAction + ActionType --> ToolCall2 + ToolCall2 --> Execute + Execute --> ObsType + ObsType --> WrapObs + + style ToolSystem fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style ActionType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px + style ObsType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px +``` + +**Tool System Boundary:** +- **Input**: `dict[str, Any]` (JSON arguments) → validated `Action` instance +- **Output**: `Observation` instance with structured result +- **No knowledge of**: Events, LLM messages, conversation state + +### Tool Definition + +Tools are defined using two patterns depending on complexity: + +#### Pattern 1: Direct Instantiation (Simple Tools) + +For stateless tools that don't need runtime configuration (e.g., `finish`, `think`): + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% +flowchart LR + Action["Define Action
with visualize"] + Obs["Define Observation
with to_llm_content"] + Exec["Define Executor
stateless logic"] + Tool["ToolDefinition(...,
executor=Executor())"] + + Action --> Tool + Obs --> Tool + Exec --> Tool + + style Tool fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Components:** +1. **Action** - Pydantic model with `visualize` property for display +2. **Observation** - Pydantic model with `to_llm_content` property for LLM +3. **ToolExecutor** - Stateless executor with `__call__(action) → observation` +4. **ToolDefinition** - Direct instantiation with executor instance + +#### Pattern 2: Subclass with Factory (Stateful Tools) + +For tools requiring runtime configuration or persistent state (e.g., `execute_bash`, `file_editor`, `glob`): + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% +flowchart LR + Action["Define Action
with visualize"] + Obs["Define Observation
with to_llm_content"] + Exec["Define Executor
with \_\_init\_\_ and state"] + Subclass["class MyTool(ToolDefinition)
with create() method"] + Instance["Return [MyTool(...,
executor=instance)]"] + + Action --> Subclass + Obs --> Subclass + Exec --> Subclass + Subclass --> Instance + + style Instance fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Components:** +1. **Action/Observation** - Same as Pattern 1 +2. **ToolExecutor** - Stateful executor with `__init__()` for configuration and optional `close()` for cleanup +3. **MyTool(ToolDefinition)** - Subclass with `@classmethod create(conv_state, ...)` factory method +4. **Factory Method** - Returns sequence of configured tool instances + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + subgraph Pattern1["Pattern 1: Direct Instantiation"] + P1A["Define Action/Observation
with visualize/to_llm_content"] + P1E["Define ToolExecutor
with \_\_call\_\_()"] + P1T["ToolDefinition(...,
executor=Executor())"] + end + + subgraph Pattern2["Pattern 2: Subclass with Factory"] + P2A["Define Action/Observation
with visualize/to_llm_content"] + P2E["Define Stateful ToolExecutor
with \_\_init\_\_() and \_\_call\_\_()"] + P2C["class MyTool(ToolDefinition)
@classmethod create()"] + P2I["Return [MyTool(...,
executor=instance)]"] + end + + P1A --> P1E + P1E --> P1T + + P2A --> P2E + P2E --> P2C + P2C --> P2I + + style P1T fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style P2I fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Key Design Elements:** + +| Component | Purpose | Requirements | +|-----------|---------|--------------| +| **Action** | Defines LLM-provided parameters | Extends `Action`, includes `visualize` property returning Rich Text | +| **Observation** | Defines structured output | Extends `Observation`, includes `to_llm_content` property returning content list | +| **ToolExecutor** | Implements business logic | Extends `ToolExecutor[ActionT, ObservationT]`, implements `__call__()` method | +| **ToolDefinition** | Ties everything together | Either instantiate directly (Pattern 1) or subclass with `create()` method (Pattern 2) | + +**When to Use Each Pattern:** + +| Pattern | Use Case | Examples | +|---------|----------|----------| +| **Direct Instantiation** | Stateless tools with no configuration needs | `finish`, `think`, simple utilities | +| **Subclass with Factory** | Tools requiring runtime state or configuration | `execute_bash`, `file_editor`, `glob`, `grep` | + +### Tool Annotations + +Tools include optional `ToolAnnotations` based on the [Model Context Protocol (MCP) spec](https://github.com/modelcontextprotocol/modelcontextprotocol) that provide behavioral hints to LLMs: + +| Field | Meaning | Examples | +|-------|---------|----------| +| `readOnlyHint` | Tool doesn't modify state | `glob` (True), `execute_bash` (False) | +| `destructiveHint` | May delete/overwrite data | `file_editor` (True), `task_tracker` (False) | +| `idempotentHint` | Repeated calls are safe | `glob` (True), `execute_bash` (False) | +| `openWorldHint` | Interacts beyond closed domain | `execute_bash` (True), `task_tracker` (False) | + +**Key Behaviors:** +- [LLM-based Security risk prediction](/sdk/guides/security) automatically added for tools with `readOnlyHint=False` +- Annotations help LLMs reason about tool safety and side effects + +### Tool Registry + +The registry enables **dynamic tool discovery** and instantiation from tool specifications: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + ToolSpec["Tool Spec
name + params"] + + subgraph Registry["Tool Registry"] + Resolver["Resolver
name → factory"] + Factory["Factory
create(params)"] + end + + Instance["Tool Instance
with executor"] + Agent["Agent"] + + ToolSpec -->|"resolve_tool(spec)"| Resolver + Resolver -->|Lookup factory| Factory + Factory -->|"create(**params)"| Instance + Instance -->|Used by| Agent + + style Registry fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Factory fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Resolution Workflow:** + +1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) +2. **Resolver Lookup** - Registry finds the registered resolver for the tool name +3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state +4. **Instance Creation** - Tool instance(s) are created with configured executors +5. **Agent Usage** - Instances are added to the agent's tools_map for execution + +**Registration Types:** + +| Type | Registration | Resolver Behavior | +|------|-------------|-------------------| +| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) | +| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` | +| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` | + +### File Organization + +Tools follow a consistent file structure for maintainability: + +``` +openhands-tools/openhands/tools/my_tool/ +├── __init__.py # Export MyTool +├── definition.py # Action, Observation, MyTool(ToolDefinition) +├── impl.py # MyExecutor(ToolExecutor) +└── [other modules] # Tool-specific utilities +``` + +**File Responsibilities:** + +| File | Contains | Purpose | +|------|----------|---------| +| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method | +| `impl.py` | ToolExecutor implementation | Business logic, state management, execution | +| `__init__.py` | Tool exports | Package interface | + +**Benefits:** +- **Separation of Concerns** - Public API separate from implementation +- **Avoid Circular Imports** - Import `impl` only inside `create()` method +- **Consistency** - All tools follow same structure for discoverability + +**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation + + +## MCP Integration + +The tool system supports external tools via the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). MCP tools are **configured separately from the tool registry** via the `mcp_config` field in `Agent` class and are automatically discovered from MCP servers during agent initialization. + +**Source:** [`openhands-sdk/openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) + +### Architecture Overview + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph External["External MCP Server"] + Server["MCP Server
stdio/HTTP"] + ExtTools["External Tools"] + end + + subgraph Bridge["MCP Integration Layer"] + MCPClient["MCPClient
Sync/Async bridge"] + Convert["Schema Conversion
MCP → MCPToolDefinition"] + MCPExec["MCPToolExecutor
Bridges to MCP calls"] + end + + subgraph Agent["Agent System"] + ToolsMap["tools_map
str -> ToolDefinition"] + AgentLogic["Agent Execution"] + end + + Server -.->|Spawns| ExtTools + MCPClient --> Server + Server --> Convert + Convert -->|create_mcp_tools| MCPExec + MCPExec -->|Added during
agent.initialize| ToolsMap + ToolsMap --> AgentLogic + AgentLogic -->|Tool call| MCPExec + MCPExec --> MCPClient + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef external fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class MCPClient primary + class Convert,MCPExec secondary + class Server,ExtTools external +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`MCPClient`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py)** | MCP server connection | Extends FastMCP with sync/async bridge | +| **[`MCPToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Tool wrapper | Wraps MCP tools as SDK `ToolDefinition` with dynamic validation | +| **[`MCPToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Execution handler | Bridges agent actions to MCP tool calls via MCPClient | +| **[`MCPToolAction`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Generic action wrapper | Simple `dict[str, Any]` wrapper for MCP tool arguments | +| **[`MCPToolObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Result wrapper | Wraps MCP tool results as observations with content blocks | +| **[`_create_mcp_action_type()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Dynamic schema | Runtime Pydantic model generated from MCP `inputSchema` for validation | + +### Sync/Async Bridge + +MCP protocol is asynchronous, but SDK tools execute synchronously. The bridge pattern in [client.py](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py) solves this: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Sync["Sync Tool Execution"] + Bridge["call_async_from_sync()"] + Loop["Background Event Loop"] + Async["Async MCP Call"] + Result["Return Result"] + + Sync --> Bridge + Bridge --> Loop + Loop --> Async + Async --> Result + Result --> Sync + + style Bridge fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Loop fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Bridge Features:** +- **Background Event Loop** - Executes async code from sync contexts +- **Timeout Support** - Configurable timeouts for MCP operations +- **Error Handling** - Wraps MCP errors in observations +- **Connection Pooling** - Reuses connections across tool calls + +### Tool Discovery Flow + +**Source:** [`create_mcp_tools()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/utils.py) | [`agent._initialize()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/base.py) + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Config["MCP Server Config
command + args"] + Spawn["Spawn Server Process
MCPClient"] + List["List Available Tools
client.list_tools()"] + + subgraph Convert["For Each MCP Tool"] + Store["Store MCP metadata
name, description, inputSchema"] + CreateExec["Create MCPToolExecutor
bound to tool + client"] + Def["Create MCPToolDefinition
generic MCPToolAction type"] + end + + Register["Add to Agent's tools_map
bypasses ToolRegistry"] + Ready["Tools Available
Dynamic models created on-demand"] + + Config --> Spawn + Spawn --> List + List --> Store + Store --> CreateExec + CreateExec --> Def + Def --> Register + Register --> Ready + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Def fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Register fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Discovery Steps:** +1. **Spawn Server** - Launch MCP server via stdio protocol (using `MCPClient`) +2. **List Tools** - Call MCP `tools/list` endpoint to retrieve available tools +3. **Parse Schemas** - Extract tool names, descriptions, and `inputSchema` from MCP response +4. **Create Definitions** - For each tool, call `MCPToolDefinition.create()` which: + - Creates an `MCPToolExecutor` instance bound to the tool name and client + - Wraps the MCP tool metadata in `MCPToolDefinition` + - Uses generic `MCPToolAction` as the action type (NOT dynamic models yet) +5. **Add to Agent** - All `MCPToolDefinition` instances are added to agent's `tools_map` during `initialize()` (bypasses ToolRegistry) +6. **Lazy Validation** - Dynamic Pydantic models are generated lazily when: + - `action_from_arguments()` is called (argument validation) + - `to_openai_tool()` is called (schema export to LLM) + +**Schema Handling:** + +| MCP Schema | SDK Integration | When Used | +|------------|----------------|-----------| +| `name` | Tool name (stored in `MCPToolDefinition`) | Discovery, execution | +| `description` | Tool description for LLM | Discovery, LLM prompt | +| `inputSchema` | Stored in `mcp_tool.inputSchema` | Lazy model generation | +| `inputSchema` fields | Converted to Pydantic fields via `Schema.from_mcp_schema()` | Validation, schema export | +| `annotations` | Mapped to `ToolAnnotations` | Security analysis, LLM hints | + +### MCP Server Configuration + +MCP servers are configured via the `mcp_config` field on the `Agent` class. Configuration follows [FastMCP config format](https://gofastmcp.com/clients/client#configuration-format): + +```python +from openhands.sdk import Agent + +agent = Agent( + mcp_config={ + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "filesystem": { + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"] + } + } + } +) +``` + +## Component Relationships + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + subgraph Sources["Tool Sources"] + Native["Native Tools"] + MCP["MCP Tools"] + end + + Registry["Tool Registry
resolve_tool"] + ToolsMap["Agent.tools_map
Merged tool dict"] + + subgraph AgentSystem["Agent System"] + Agent["Agent Logic"] + LLM["LLM"] + end + + Security["Security Analyzer"] + Conversation["Conversation State"] + + Native -->|register_tool| Registry + Registry --> ToolsMap + MCP -->|create_mcp_tools| ToolsMap + ToolsMap -->|Provide schemas| LLM + Agent -->|Execute tools| ToolsMap + ToolsMap -.->|Action risk| Security + ToolsMap -.->|Read state| Conversation + + style ToolsMap fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Security fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Native → Registry → tools_map**: Native tools resolved via `ToolRegistry` +- **MCP → tools_map**: MCP tools bypass registry, added directly during `initialize()` +- **tools_map → LLM**: Generate schemas describing all available capabilities +- **Agent → tools_map**: Execute actions, receive observations +- **tools_map → Conversation**: Read state for context-aware execution +- **tools_map → Security**: Tool annotations inform risk assessment + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents select and execute tools +- **[Events](/sdk/arch/events)** - ActionEvent and ObservationEvent structures +- **[Security Analyzer](/sdk/arch/security)** - Action risk assessment +- **[Skill Architecture](/sdk/arch/skill)** - Embedding MCP configs in repository skills +- **[Custom Tools Guide](/sdk/guides/custom-tools)** - Building your own tools +- **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library + +### Workspace +Source: https://docs.openhands.dev/sdk/arch/workspace.md + +The **Workspace** component abstracts execution environments for agent operations. It provides a unified interface for command execution and file operations across local processes, containers, and remote servers. + +**Source:** [`openhands/sdk/workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/workspace) + +## Core Responsibilities + +The Workspace system has four primary responsibilities: + +1. **Execution Abstraction** - Unified interface for command execution across environments +2. **File Operations** - Upload, download, and manipulate files in workspace +3. **Resource Management** - Context manager protocol for setup/teardown +4. **Environment Isolation** - Separate agent execution from host system + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 60}} }%% +flowchart TB + subgraph Interface["Abstract Interface"] + Base["BaseWorkspace
Abstract base class"] + end + + subgraph Implementations["Concrete Implementations"] + Local["LocalWorkspace
Direct subprocess"] + Remote["RemoteWorkspace
HTTP API calls"] + end + + subgraph Operations["Core Operations"] + Command["execute_command()"] + Upload["file_upload()"] + Download["file_download()"] + Context["__enter__ / __exit__"] + end + + subgraph Targets["Execution Targets"] + Process["Local Process"] + Container["Docker Container"] + Server["Remote Server"] + end + + Base --> Local + Base --> Remote + + Base -.->|Defines| Operations + + Local --> Process + Remote --> Container + Remote --> Server + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class Local,Remote secondary + class Command,Upload tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`BaseWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py)** | Abstract interface | Defines execution and file operation contracts | +| **[`LocalWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/local.py)** | Local execution | Subprocess-based command execution | +| **[`RemoteWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/remote/base.py)** | Remote execution | HTTP API-based execution via agent-server | +| **[`CommandResult`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/models.py)** | Execution output | Structured result with stdout, stderr, exit_code | +| **[`FileOperationResult`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/models.py)** | File op outcome | Success status and metadata | + +## Workspace Types + +### Local vs Remote Execution + + +| Aspect | LocalWorkspace | RemoteWorkspace | +|--------|----------------|-----------------| +| **Execution** | Direct subprocess | HTTP → agent-server | +| **Isolation** | Process-level | Container/VM-level | +| **Performance** | Fast (no network) | Network overhead | +| **Security** | Host system access | Sandboxed | +| **Use Case** | Development, CLI | Production, web apps | + +## Core Operations + +### Command Execution + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + Tool["Tool invokes
execute_command()"] + + Decision{"Workspace
type?"} + + LocalExec["subprocess.run()
Direct execution"] + RemoteExec["POST /command
HTTP API"] + + Result["CommandResult
stdout, stderr, exit_code"] + + Tool --> Decision + Decision -->|Local| LocalExec + Decision -->|Remote| RemoteExec + + LocalExec --> Result + RemoteExec --> Result + + style Decision fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style LocalExec fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style RemoteExec fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Command Result Structure:** + +| Field | Type | Description | +|-------|------|-------------| +| **stdout** | str | Standard output stream | +| **stderr** | str | Standard error stream | +| **exit_code** | int | Process exit code (0 = success) | +| **timeout** | bool | Whether command timed out | +| **duration** | float | Execution time in seconds | + +### File Operations + +| Operation | Local Implementation | Remote Implementation | +|-----------|---------------------|----------------------| +| **Upload** | `shutil.copy()` | `POST /file/upload` with multipart | +| **Download** | `shutil.copy()` | `GET /file/download` stream | +| **Result** | `FileOperationResult` | `FileOperationResult` | + +## Resource Management + +Workspaces use **context manager** for safe resource handling: + +**Lifecycle Hooks:** + +| Phase | LocalWorkspace | RemoteWorkspace | +|-------|----------------|-----------------| +| **Enter** | Create working directory | Connect to agent-server, verify | +| **Use** | Execute commands | Proxy commands via HTTP | +| **Exit** | No cleanup (persistent) | Disconnect, optionally stop container | + +## Remote Workspace Extensions + +The SDK provides remote workspace implementations in `openhands-workspace` package: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 50}} }%% +flowchart TB + Base["RemoteWorkspace
SDK base class"] + + Docker["DockerWorkspace
Auto-spawn containers"] + API["RemoteAPIWorkspace
Connect to existing server"] + + Base -.->|Extended by| Docker + Base -.->|Extended by| API + + Docker -->|Creates| Container["Docker Container
with agent-server"] + API -->|Connects| Server["Remote Agent Server"] + + style Base fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Docker fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style API fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Implementation Comparison:** + +| Type | Setup | Isolation | Use Case | +|------|-------|-----------|----------| +| **LocalWorkspace** | Immediate | Process | Development, trusted code | +| **DockerWorkspace** | Spawn container | Container | Multi-user, untrusted code | +| **RemoteAPIWorkspace** | Connect to URL | Remote server | Distributed systems, cloud | + +**Source:** +- **DockerWorkspace**: [`openhands-workspace/openhands/workspace/docker`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace/docker) +- **RemoteAPIWorkspace**: [`openhands-workspace/openhands/workspace/remote_api`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace/remote_api) + +## Component Relationships + +### How Workspace Integrates + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Workspace["Workspace"] + Conversation["Conversation"] + AgentServer["Agent Server"] + + Conversation -->|Configures| Workspace + Workspace -.->|Remote type| AgentServer + + style Workspace fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Conversation fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Conversation → Workspace**: Conversation factory uses workspace type to select LocalConversation or RemoteConversation +- **Workspace → Agent Server**: RemoteWorkspace delegates operations to agent-server API +- **Tools Independence**: Tools run in the same environment as workspace + +## See Also + +- **[Conversation Architecture](/sdk/arch/conversation)** - How workspace type determines conversation implementation +- **[Agent Server](/sdk/arch/agent-server)** - Remote execution API +- **[Tool System](/sdk/arch/tool-system)** - Tools that use workspace for execution + +### FAQ +Source: https://docs.openhands.dev/sdk/faq.md + +## How do I use AWS Bedrock with the SDK? + +**Yes, the OpenHands SDK supports AWS Bedrock through LiteLLM.** + +Since LiteLLM requires `boto3` for Bedrock requests, you need to install it alongside the SDK. + + + +### Step 1: Install boto3 + +Install the SDK with boto3: + +```bash +# Using pip +pip install openhands-sdk boto3 + +# Using uv +uv pip install openhands-sdk boto3 + +# Or when installing as a CLI tool +uv tool install openhands --with boto3 +``` + +### Step 2: Configure Authentication + +You have two authentication options: + +**Option A: API Key Authentication (Recommended)** + +Use the `AWS_BEARER_TOKEN_BEDROCK` environment variable: + +```bash +export AWS_BEARER_TOKEN_BEDROCK="your-bedrock-api-key" +``` + +**Option B: AWS Credentials** + +Use traditional AWS credentials: + +```bash +export AWS_ACCESS_KEY_ID="your-access-key" +export AWS_SECRET_ACCESS_KEY="your-secret-key" +export AWS_REGION_NAME="us-west-2" +``` + +### Step 3: Configure the Model + +Use the `bedrock/` prefix for your model name: + +```python +from openhands.sdk import LLM, Agent + +llm = LLM( + model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0", + # api_key is read from AWS_BEARER_TOKEN_BEDROCK automatically +) +``` + +For cross-region inference profiles, include the region prefix: + +```python +llm = LLM( + model="bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0", # US region + # or + model="bedrock/apac.anthropic.claude-sonnet-4-20250514-v1:0", # APAC region +) +``` + + + +For more details on Bedrock configuration options, see the [LiteLLM Bedrock documentation](https://docs.litellm.ai/docs/providers/bedrock). + +## Does the agent SDK support parallel tool calling? + +**Yes, the OpenHands SDK supports parallel tool calling by default.** + +The SDK automatically handles parallel tool calls when the underlying LLM (like Claude or GPT-4) returns multiple tool calls in a single response. This allows agents to execute multiple independent actions before the next LLM call. + + +When the LLM generates multiple tool calls in parallel, the SDK groups them using a shared `llm_response_id`: + +```python +ActionEvent(llm_response_id="abc123", thought="Let me check...", tool_call=tool1) +ActionEvent(llm_response_id="abc123", thought=[], tool_call=tool2) +# Combined into: Message(role="assistant", content="Let me check...", tool_calls=[tool1, tool2]) +``` + +Multiple `ActionEvent`s with the same `llm_response_id` are grouped together and combined into a single LLM message with multiple `tool_calls`. Only the first event's thought/reasoning is included. The parallel tool calling implementation can be found in the [Events Architecture](/sdk/arch/events#event-types) for detailed explanation of how parallel function calling works, the [`prepare_llm_messages` in utils.py](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/utils.py) which groups ActionEvents by `llm_response_id` when converting events to LLM messages, the [agent step method](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/agent.py#L200-L300) where actions are created with shared `llm_response_id`, and the [`ActionEvent` class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/action.py) which includes the `llm_response_id` field. For more details, see the **[Events Architecture](/sdk/arch/events)** for a deep dive into the event system and parallel function calling, the **[Tool System](/sdk/arch/tool-system)** for understanding how tools work with the agent, and the **[Agent Architecture](/sdk/arch/agent)** for how agents process and execute actions. + + +## Does the agent SDK support image content? + +**Yes, the OpenHands SDK fully supports image content for vision-capable LLMs.** + +The SDK supports both HTTP/HTTPS URLs and base64-encoded images through the `ImageContent` class. + + + +### Check Vision Support + +Before sending images, verify your LLM supports vision: + +```python +from openhands.sdk import LLM +from pydantic import SecretStr + +llm = LLM( + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("your-api-key"), + usage_id="my-agent" +) + +# Check if vision is active +assert llm.vision_is_active(), "Model does not support vision" +``` + +### Using HTTP URLs + +```python +from openhands.sdk import ImageContent, Message, TextContent + +message = Message( + role="user", + content=[ + TextContent(text="What do you see in this image?"), + ImageContent(image_urls=["https://example.com/image.png"]), + ], +) +``` + +### Using Base64 Images + +Base64 images are supported using data URLs: + +```python +import base64 +from openhands.sdk import ImageContent, Message, TextContent + +# Read and encode an image file +with open("my_image.png", "rb") as f: + image_base64 = base64.b64encode(f.read()).decode("utf-8") + +# Create message with base64 image +message = Message( + role="user", + content=[ + TextContent(text="Describe this image"), + ImageContent(image_urls=[f"data:image/png;base64,{image_base64}"]), + ], +) +``` + +### Supported Image Formats + +The data URL format is: `data:;base64,` + +Supported MIME types: +- `image/png` +- `image/jpeg` +- `image/gif` +- `image/webp` +- `image/bmp` + +### Built-in Image Support + +Several SDK tools automatically handle images: + +- **FileEditorTool**: When viewing image files (`.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`, `.bmp`), they're automatically converted to base64 and sent to the LLM +- **BrowserUseTool**: Screenshots are captured and sent as base64 images +- **MCP Tools**: Image content from MCP tool results is automatically converted to base64 data URLs + +### Disabling Vision + +To disable vision for cost reduction (even on vision-capable models): + +```python +llm = LLM( + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("your-api-key"), + usage_id="my-agent", + disable_vision=True, # Images will be filtered out +) +``` + + + +For a complete example, see the [image input example](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) in the SDK repository. + +## How do I handle MessageEvent in one-off tasks? + +**The SDK provides utilities to automatically respond to agent messages when running tasks end-to-end.** + +When running one-off tasks, some models may send a `MessageEvent` (proposing an action or asking for confirmation) instead of directly using tools. This causes `conversation.run()` to return, even though the agent hasn't finished the task. + + + +When an agent sends a message (via `MessageEvent`) instead of using the `finish` tool, the conversation ends because it's waiting for user input. In automated pipelines, there's no human to respond, so the task appears incomplete. + +**Key event types:** +- `ActionEvent`: Agent uses a tool (terminal, file editor, etc.) +- `MessageEvent`: Agent sends a text message (waiting for user response) +- `FinishAction`: Agent explicitly signals task completion + +The solution is to automatically send a "fake user response" when the agent sends a message, prompting it to continue. + + + + + +The [`run_conversation_with_fake_user_response`](https://github.com/OpenHands/benchmarks/blob/main/benchmarks/utils/fake_user_response.py) function wraps your conversation and automatically handles agent messages: + +```python +from openhands.sdk.conversation.state import ConversationExecutionStatus +from openhands.sdk.event import ActionEvent, MessageEvent +from openhands.sdk.tool.builtins.finish import FinishAction + +def run_conversation_with_fake_user_response(conversation, max_responses: int = 10): + """Run conversation, auto-responding to agent messages until finish or limit.""" + for _ in range(max_responses): + conversation.run() + if conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + break + events = list(conversation.state.events) + # Check if agent used finish tool + if any(isinstance(e, ActionEvent) and isinstance(e.action, FinishAction) for e in reversed(events)): + break + # Check if agent sent a message (needs response) + if not any(isinstance(e, MessageEvent) and e.source == "agent" for e in reversed(events)): + break + # Send continuation prompt + conversation.send_message( + "Please continue. Use the finish tool when done. DO NOT ask for human help." + ) +``` + + + + + +```python +from openhands.sdk import Agent, Conversation, LLM +from openhands.workspace import DockerWorkspace +from openhands.tools.preset.default import get_default_tools + +llm = LLM(model="anthropic/claude-sonnet-4-20250514", api_key="...") +agent = Agent(llm=llm, tools=get_default_tools()) +workspace = DockerWorkspace() +conversation = Conversation(agent=agent, workspace=workspace, max_iteration_per_run=100) + +conversation.send_message("Fix the bug in src/utils.py") +run_conversation_with_fake_user_response(conversation, max_responses=10) +# Results available in conversation.state.events +``` + + + + +**Pro tip:** Add a hint to your task prompt: +> "If you're 100% done with the task, use the finish action. Otherwise, keep going until you're finished." + +This encourages the agent to use the finish tool rather than asking for confirmation. + + +For the full implementation used in OpenHands benchmarks, see the [fake_user_response.py](https://github.com/OpenHands/benchmarks/blob/main/benchmarks/utils/fake_user_response.py) module. + +## More questions? + +If you have additional questions: + +- **[Join our Slack Community](https://openhands.dev/joinslack)** - Ask questions and get help from the community +- **[GitHub Issues](https://github.com/OpenHands/software-agent-sdk/issues)** - Report bugs, request features, or start a discussion + +### Getting Started +Source: https://docs.openhands.dev/sdk/getting-started.md + +The OpenHands SDK is a modular framework for building AI agents that interact with code, files, and system commands. Agents can execute bash commands, edit files, browse the web, and more. + +## Prerequisites + +Install the **[uv package manager](https://docs.astral.sh/uv/)** (version 0.8.13+): + +```bash +curl -LsSf https://astral.sh/uv/install.sh | sh +``` + +## Installation + +### Step 1: Acquire an LLM API Key + +The SDK requires an LLM API key from any [LiteLLM-supported provider](https://docs.litellm.ai/docs/providers). See our [recommended models](/openhands/usage/llms/llms) for best results. + + + + Bring your own API key from providers like: + - [Anthropic](https://console.anthropic.com/) + - [OpenAI](https://platform.openai.com/) + - [Other LiteLLM-supported providers](https://docs.litellm.ai/docs/providers) + + Example: + ```bash + export LLM_API_KEY="your-api-key" + uv run python examples/01_standalone_sdk/01_hello_world.py + ``` + + + + Sign up for [OpenHands Cloud](https://app.all-hands.dev) and get an LLM API key from the [API keys page](https://app.all-hands.dev/settings/api-keys). This gives you access to models verified to work well with OpenHands, with no markup. + + Example: + ```bash + export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" + uv run python examples/01_standalone_sdk/01_hello_world.py + ``` + + [Learn more →](/openhands/usage/llms/openhands-llms) + + + + If you have a ChatGPT Plus or Pro subscription, you can use `LLM.subscription_login()` to authenticate with your ChatGPT account and access Codex models without consuming API credits. + + ```python + from openhands.sdk import LLM + + llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") + ``` + + [Learn more →](/sdk/guides/llm-subscriptions) + + + +> Tip: Model name prefixes depend on your provider +> +> - If you bring your own provider key (Anthropic/OpenAI/etc.), use that provider's model name, e.g. `anthropic/claude-sonnet-4-5-20250929` +OpenHands supports [dozens of models](https://docs.openhands.dev/sdk/arch/llm#llm-providers), you can choose the model you want to try. +> - If you use OpenHands Cloud, use `openhands/`-prefixed models, e.g. `openhands/claude-sonnet-4-5-20250929` +> +> Many examples in the docs read the model from the `LLM_MODEL` environment variable. You can set it like: +> +> ```bash +> export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" # for OpenHands Provider +> ``` + +**Set Your API Key:** + +```bash +export LLM_API_KEY=your-api-key-here +``` + +### Step 2: Install the SDK + + + + ```bash + pip install openhands-sdk # Core SDK (openhands.sdk) + pip install openhands-tools # Built-in tools (openhands.tools) + # Optional: required for sandboxed workspaces in Docker or remote servers + pip install openhands-workspace # Workspace backends (openhands.workspace) + pip install openhands-agent-server # Remote agent server (openhands.agent_server) + ``` + + + + ```bash + # Clone the repository + git clone https://github.com/OpenHands/software-agent-sdk.git + cd software-agent-sdk + + # Install dependencies and setup development environment + make build + ``` + + + + +### Step 3: Run Your First Agent + +Here's a complete example that creates an agent and asks it to perform a simple task: + +```python icon="python" expandable examples/01_standalone_sdk/01_hello_world.py +import os + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +llm = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL", None), +) + +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], +) + +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) + +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("All done!") +``` + +Run the example: + +```bash +# Using a direct provider key (Anthropic/OpenAI/etc.) +uv run python examples/01_standalone_sdk/01_hello_world.py +``` + +```bash +# Using OpenHands Cloud +export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" +uv run python examples/01_standalone_sdk/01_hello_world.py +``` + +You should see the agent understand your request, explore the project, and create a file with facts about it. + +## Core Concepts + +**Agent**: An AI-powered entity that can reason, plan, and execute actions using tools. + +**Tools**: Capabilities like executing bash commands, editing files, or browsing the web. + +**Workspace**: The execution environment where agents operate (local, Docker, or remote). + +**Conversation**: Manages the interaction lifecycle between you and the agent. + +## Basic Workflow + +1. **Configure LLM**: Choose model and provide API key +2. **Create Agent**: Use preset or custom configuration +3. **Add Tools**: Enable capabilities (bash, file editing, etc.) +4. **Start Conversation**: Create conversation context +5. **Send Message**: Provide task description +6. **Run Agent**: Agent executes until task completes or stops +7. **Get Result**: Review agent's output and actions + + +## Try More Examples + +The repository includes 24+ examples demonstrating various capabilities: + +```bash +# Simple hello world +uv run python examples/01_standalone_sdk/01_hello_world.py + +# Custom tools +uv run python examples/01_standalone_sdk/02_custom_tools.py + +# With skills +uv run python examples/01_standalone_sdk/03_activate_microagent.py + +# See all examples +ls examples/01_standalone_sdk/ +``` + + +## Next Steps + +### Explore Documentation + +- **[SDK Architecture](/sdk/arch/sdk)** - Deep dive into components +- **[Tool System](/sdk/arch/tool-system)** - Available tools +- **[Workspace Architecture](/sdk/arch/workspace)** - Execution environments +- **[LLM Configuration](/sdk/arch/llm)** - Deep dive into language model configuration + +### Build Custom Solutions + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create custom tools to expand agent capabilities +- **[MCP Integration](/sdk/guides/mcp)** - Connect to external tools via Model Context Protocol +- **[Docker Workspaces](/sdk/guides/agent-server/docker-sandbox)** - Sandbox agent execution in containers + +### Get Help + +- **[Slack Community](https://openhands.dev/joinslack)** - Ask questions and share projects +- **[GitHub Issues](https://github.com/OpenHands/software-agent-sdk/issues)** - Report bugs or request features +- **[Example Directory](https://github.com/OpenHands/software-agent-sdk/tree/main/examples)** - Browse working code samples + +### ACP Agent +Source: https://docs.openhands.dev/sdk/guides/agent-acp.md + +> A ready-to-run example is available [here](#ready-to-run-example)! + +`ACPAgent` lets you use any [Agent Client Protocol](https://agentclientprotocol.com/protocol/overview) server as the backend for an OpenHands conversation. Instead of calling an LLM directly, the agent spawns an ACP server subprocess and communicates with it over JSON-RPC. The server manages its own LLM, tools, and execution — your code just sends messages and collects responses. + +## Basic Usage + +```python icon="python" highlight={5,7-9} +from openhands.sdk.agent import ACPAgent +from openhands.sdk.conversation import Conversation + +# Point at any ACP-compatible server +agent = ACPAgent(acp_command=["npx", "-y", "claude-code-acp"]) + +conversation = Conversation(agent=agent, workspace="./my-project") +conversation.send_message("Explain the architecture of this project.") +conversation.run() + +agent.close() +``` + +The `acp_command` is the shell command used to spawn the server process. The SDK communicates with it over stdin/stdout JSON-RPC. + + +**Key difference from standard agents:** With `ACPAgent`, you don't need an `LLM_API_KEY` in your code. The ACP server handles its own LLM authentication and API calls. This is *delegation* — your code sends messages to the ACP server, which manages all LLM interactions internally. + + +### What ACPAgent Does Not Support + +Because the ACP server manages its own tools and context, these `AgentBase` features are not available on `ACPAgent`: + +- `tools` / `include_default_tools` — the server has its own tools +- `mcp_config` — configure MCP on the server side +- `condenser` — the server manages its own context window +- `critic` — the server manages its own evaluation +- `agent_context` — configure the server directly + +Passing any of these raises `NotImplementedError` at initialization. + +## How It Works + +- **Subprocess delegation**: `ACPAgent` spawns the ACP server and communicates via JSON-RPC over stdin/stdout +- **Server-managed execution**: The ACP server handles its own LLM calls, tools, and context — your code just sends messages +- **Auto-approval**: Permission requests from the server are automatically granted, so ensure you trust the ACP server you're running +- **Metrics collection**: Token usage and costs from the server are captured into the agent's `LLM.metrics` + +## Configuration + +### Server Command and Arguments + +```python icon="python" +agent = ACPAgent( + acp_command=["npx", "-y", "claude-code-acp"], + acp_args=["--profile", "my-profile"], # extra CLI args + acp_env={"CLAUDE_API_KEY": "sk-..."}, # extra env vars +) +``` + +| Parameter | Description | +|-----------|-------------| +| `acp_command` | Command to start the ACP server (required) | +| `acp_args` | Additional arguments appended to the command | +| `acp_env` | Additional environment variables for the server process | + +## Metrics + +Token usage and cost data are automatically captured from the ACP server's responses. You can inspect them through the standard `LLM.metrics` interface: + +```python icon="python" +metrics = agent.llm.metrics +print(f"Total cost: ${metrics.accumulated_cost:.6f}") + +for usage in metrics.token_usages: + print(f" prompt={usage.prompt_tokens} completion={usage.completion_tokens}") +``` + +Usage data comes from two ACP protocol sources: +- **`PromptResponse.usage`** — per-turn token counts (input, output, cached, reasoning tokens) +- **`UsageUpdate` notifications** — cumulative session cost and context window size + +## Cleanup + +Always call `agent.close()` when you are done to terminate the ACP server subprocess. A `try/finally` block is recommended: + +```python icon="python" +agent = ACPAgent(acp_command=["npx", "-y", "claude-code-acp"]) +try: + conversation = Conversation(agent=agent, workspace=".") + conversation.send_message("Hello!") + conversation.run() +finally: + agent.close() +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/40_acp_agent_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/40_acp_agent_example.py) + + +```python icon="python" expandable examples/01_standalone_sdk/40_acp_agent_example.py +"""Example: Using ACPAgent with Claude Code ACP server. + +This example shows how to use an ACP-compatible server (claude-code-acp) +as the agent backend instead of direct LLM calls. It also demonstrates +``ask_agent()`` — a stateless side-question that forks the ACP session +and leaves the main conversation untouched. + +Prerequisites: + - Node.js / npx available + - Claude Code CLI authenticated (or CLAUDE_API_KEY set) + +Usage: + uv run python examples/01_standalone_sdk/40_acp_agent_example.py +""" + +import os + +from openhands.sdk.agent import ACPAgent +from openhands.sdk.conversation import Conversation + + +agent = ACPAgent(acp_command=["npx", "-y", "@zed-industries/claude-code-acp"]) + +try: + cwd = os.getcwd() + conversation = Conversation(agent=agent, workspace=cwd) + + # --- Main conversation turn --- + conversation.send_message( + "List the Python source files under openhands-sdk/openhands/sdk/agent/, " + "then read the __init__.py and summarize what agent classes are exported." + ) + conversation.run() + + # --- ask_agent: stateless side-question via fork_session --- + print("\n--- ask_agent ---") + response = conversation.ask_agent( + "Based on what you just saw, which agent class is the newest addition?" + ) + print(f"ask_agent response: {response}") +finally: + # Clean up the ACP server subprocess + agent.close() + +print("Done!") +``` + +This example does not use an LLM API key directly — the ACP server (Claude Code) handles authentication on its own. + +```bash Running the Example +# Ensure Claude Code CLI is authenticated first +# (or set CLAUDE_API_KEY in your environment) +cd software-agent-sdk +uv run python examples/01_standalone_sdk/40_acp_agent_example.py +``` + +## Next Steps + +- **[Creating Custom Agents](/sdk/guides/agent-custom)** — Build specialized agents with custom tool sets and system prompts +- **[Agent Delegation](/sdk/guides/agent-delegation)** — Compose multiple agents for complex workflows +- **[LLM Metrics](/sdk/guides/metrics)** — Track token usage and costs across models + +### Browser Use +Source: https://docs.openhands.dev/sdk/guides/agent-browser-use.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The BrowserToolSet integration enables your agent to interact with web pages through automated browser control. Built +on top of [browser-use](https://github.com/browser-use/browser-use), it provides capabilities for navigating websites, clicking elements, filling forms, +and extracting content - all through natural language instructions. + +## How It Works + +The [ready-to-run example](#ready-to-run-example) demonstrates combining multiple tools to create a capable web research agent: + +1. **BrowserToolSet**: Provides automated browser control for web interaction +2. **FileEditorTool**: Allows the agent to read and write files if needed +3. **BashTool**: Enables command-line operations for additional functionality + +The agent uses these tools to: +- Navigate to specified URLs +- Interact with web page elements (clicking, scrolling, etc.) +- Extract and analyze content from web pages +- Summarize information from multiple sources + +In this example, the agent visits the openhands.dev blog, finds the latest blog post, and provides a summary of its main points. + +## Customization + +For advanced use cases requiring only a subset of browser tools or custom configurations, you can manually +register individual browser tools. Refer to the [BrowserToolSet definition](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/browser_use/definition.py) to see the available individual +tools and create a `BrowserToolExecutor` with customized tool configurations before constructing the Agent. +This gives you fine-grained control over which browser capabilities are exposed to the agent. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/15_browser_use.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/15_browser_use.py) + + +```python icon="python" expandable examples/01_standalone_sdk/15_browser_use.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.browser_use import BrowserToolSet +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=BrowserToolSet.name), +] + +# If you need fine-grained browser control, you can manually register individual browser +# tools by creating a BrowserToolExecutor and providing factories that return customized +# Tool instances before constructing the Agent. + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message( + "Could you go to https://openhands.dev/ blog page and summarize main " + "points of the latest blog?" +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools +- **[MCP Integration](/sdk/guides/mcp)** - Connect external services + +### Creating Custom Agent +Source: https://docs.openhands.dev/sdk/guides/agent-custom.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +This guide demonstrates how to create custom agents tailored for specific use cases. Using the planning agent as a concrete example, you'll learn how to design specialized agents with custom tool sets, system prompts, and configurations that optimize performance for particular workflows. + + +This example is available on GitHub: [examples/01_standalone_sdk/24_planning_agent_workflow.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py) + + + +The example showcases a two-phase workflow where a custom planning agent (with read-only tools) analyzes tasks and creates structured plans, followed by an execution agent that implements those plans with full editing capabilities. + +```python icon="python" expandable examples/01_standalone_sdk/24_planning_agent_workflow.py +#!/usr/bin/env python3 +""" +Planning Agent Workflow Example + +This example demonstrates a two-stage workflow: +1. Planning Agent: Analyzes the task and creates a detailed implementation plan +2. Execution Agent: Implements the plan with full editing capabilities + +The task: Create a Python web scraper that extracts article titles and URLs +from a news website, handles rate limiting, and saves results to JSON. +""" + +import os +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation +from openhands.sdk.llm import content_to_str +from openhands.tools.preset.default import get_default_agent +from openhands.tools.preset.planning import get_planning_agent + + +def get_event_content(event): + """Extract content from an event.""" + if hasattr(event, "llm_message"): + return "".join(content_to_str(event.llm_message.content)) + return str(event) + + +"""Run the planning agent workflow example.""" + +# Create a temporary workspace +workspace_dir = Path(tempfile.mkdtemp()) +print(f"Working in: {workspace_dir}") + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="agent", +) + +# Task description +task = """ +Create a Python web scraper with the following requirements: +- Scrape article titles and URLs from a news website +- Handle HTTP errors gracefully with retry logic +- Save results to a JSON file with timestamp +- Use requests and BeautifulSoup for scraping + +Do NOT ask for any clarifying questions. Directly create your implementation plan. +""" + +print("=" * 80) +print("PHASE 1: PLANNING") +print("=" * 80) + +# Create Planning Agent with read-only tools +planning_agent = get_planning_agent(llm=llm) + +# Create conversation for planning +planning_conversation = Conversation( + agent=planning_agent, + workspace=str(workspace_dir), +) + +# Run planning phase +print("Planning Agent is analyzing the task and creating implementation plan...") +planning_conversation.send_message( + f"Please analyze this web scraping task and create a detailed " + f"implementation plan:\n\n{task}" +) +planning_conversation.run() + +print("\n" + "=" * 80) +print("PLANNING COMPLETE") +print("=" * 80) +print(f"Implementation plan saved to: {workspace_dir}/PLAN.md") + +print("\n" + "=" * 80) +print("PHASE 2: EXECUTION") +print("=" * 80) + +# Create Execution Agent with full editing capabilities +execution_agent = get_default_agent(llm=llm, cli_mode=True) + +# Create conversation for execution +execution_conversation = Conversation( + agent=execution_agent, + workspace=str(workspace_dir), +) + +# Prepare execution prompt with reference to the plan file +execution_prompt = f""" +Please implement the web scraping project according to the implementation plan. + +The detailed implementation plan has been created and saved at: {workspace_dir}/PLAN.md + +Please read the plan from PLAN.md and implement all components according to it. + +Create all necessary files, implement the functionality, and ensure everything +works together properly. +""" + +print("Execution Agent is implementing the plan...") +execution_conversation.send_message(execution_prompt) +execution_conversation.run() + +# Get the last message from the conversation +execution_result = execution_conversation.state.events[-1] + +print("\n" + "=" * 80) +print("EXECUTION RESULT:") +print("=" * 80) +print(get_event_content(execution_result)) + +print("\n" + "=" * 80) +print("WORKFLOW COMPLETE") +print("=" * 80) +print(f"Project files created in: {workspace_dir}") + +# List created files +print("\nCreated files:") +for file_path in workspace_dir.rglob("*"): + if file_path.is_file(): + print(f" - {file_path.relative_to(workspace_dir)}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Anatomy of a Custom Agent + +The planning agent demonstrates the two key components for creating specialized agent: + +### 1. Custom Tool Selection + +Choose tools that match your agent's specific role. Here's how the planning agent defines its tools: + +```python icon="python" + +def register_planning_tools() -> None: + """Register the planning agent tools.""" + from openhands.tools.glob import GlobTool + from openhands.tools.grep import GrepTool + from openhands.tools.planning_file_editor import PlanningFileEditorTool + + register_tool("GlobTool", GlobTool) + logger.debug("Tool: GlobTool registered.") + register_tool("GrepTool", GrepTool) + logger.debug("Tool: GrepTool registered.") + register_tool("PlanningFileEditorTool", PlanningFileEditorTool) + logger.debug("Tool: PlanningFileEditorTool registered.") + + +def get_planning_tools() -> list[Tool]: + """Get the planning agent tool specifications. + + Returns: + List of tools optimized for planning and analysis tasks, including + file viewing and PLAN.md editing capabilities for advanced + code discovery and navigation. + """ + register_planning_tools() + + return [ + Tool(name="GlobTool"), + Tool(name="GrepTool"), + Tool(name="PlanningFileEditorTool"), + ] +``` + +The planning agent uses: +- **GlobTool**: For discovering files and directories matching patterns +- **GrepTool**: For searching specific content across files +- **PlanningFileEditorTool**: For writing structured plans to `PLAN.md` only + +This read-only approach (except for `PLAN.md`) keeps the agent focused on analysis without implementation distractions. + +### 2. System Prompt Customization + +Custom agents can use specialized system prompts to guide behavior. The planning agent uses `system_prompt_planning.j2` with injected plan structure that enforces: +1. **Objective**: Clear goal statement +2. **Context Summary**: Relevant system components and constraints +3. **Approach Overview**: High-level strategy and rationale +4. **Implementation Steps**: Detailed step-by-step execution plan +5. **Testing and Validation**: Verification methods and success criteria + +### Complete Implementation Reference + +For a complete implementation example showing all these components working together, refer to the [planning agent preset source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/preset/planning.py). + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools for your use case +- **[Context Condenser](/sdk/guides/context-condenser)** - Optimize context management +- **[MCP Integration](/sdk/guides/mcp)** - Add MCP + +### Sub-Agent Delegation +Source: https://docs.openhands.dev/sdk/guides/agent-delegation.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## Overview + +Agent delegation allows a main agent to spawn multiple sub-agents and delegate tasks to them for parallel processing. Each sub-agent runs independently with its own conversation context and returns results that the main agent can consolidate and process further. + +This pattern is useful when: +- Breaking down complex problems into independent subtasks +- Processing multiple related tasks in parallel +- Separating concerns between different specialized sub-agents +- Improving throughput for parallelizable work + +## How It Works + +The delegation system consists of two main operations: + +### 1. Spawning Sub-Agents + +Before delegating work, the agent must first spawn sub-agents with meaningful identifiers: + +```python icon="python" wrap +# Agent uses the delegate tool to spawn sub-agents +{ + "command": "spawn", + "ids": ["lodging", "activities"] +} +``` + +Each spawned sub-agent: +- Gets a unique identifier that the agent specify (e.g., "lodging", "activities") +- Inherits the same LLM configuration as the parent agent +- Operates in the same workspace as the main agent +- Maintains its own independent conversation context + +### 2. Delegating Tasks + +Once sub-agents are spawned, the agent can delegate tasks to them: + +```python icon="python" wrap +# Agent uses the delegate tool to assign tasks +{ + "command": "delegate", + "tasks": { + "lodging": "Find the best budget-friendly areas to stay in London", + "activities": "List top 5 must-see attractions and hidden gems in London" + } +} +``` + +The delegate operation: +- Runs all sub-agent tasks in parallel using threads +- Blocks until all sub-agents complete their work +- Returns a single consolidated observation with all results +- Handles errors gracefully and reports them per sub-agent + +## Setting Up the DelegateTool + + + + ### Register the Tool + + ```python icon="python" wrap + from openhands.sdk.tool import register_tool + from openhands.tools.delegate import DelegateTool + + register_tool("DelegateTool", DelegateTool) + ``` + + + ### Add to Agent Tools + + ```python icon="python" wrap + from openhands.sdk import Tool + from openhands.tools.preset.default import get_default_tools + + tools = get_default_tools(enable_browser=False) + tools.append(Tool(name="DelegateTool")) + + agent = Agent(llm=llm, tools=tools) + ``` + + + ### Configure Maximum Sub-Agents (Optional) + + The user can limit the maximum number of concurrent sub-agents: + + ```python icon="python" wrap + from openhands.tools.delegate import DelegateTool + + class CustomDelegateTool(DelegateTool): + @classmethod + def create(cls, conv_state, max_children: int = 3): + # Only allow up to 3 sub-agents + return super().create(conv_state, max_children=max_children) + + register_tool("DelegateTool", CustomDelegateTool) + ``` + + + + +## Tool Commands + +### spawn + +Initialize sub-agents with meaningful identifiers. + +**Parameters:** +- `command`: `"spawn"` +- `ids`: List of string identifiers (e.g., `["research", "implementation", "testing"]`) + +**Returns:** +A message indicating the sub-agents were successfully spawned. + +**Example:** +```python icon="python" wrap +{ + "command": "spawn", + "ids": ["research", "implementation", "testing"] +} +``` + +### delegate + +Send tasks to specific sub-agents and wait for results. + +**Parameters:** +- `command`: `"delegate"` +- `tasks`: Dictionary mapping sub-agent IDs to task descriptions + +**Returns:** +A consolidated message containing all results from the sub-agents. + +**Example:** +```python icon="python" wrap +{ + "command": "delegate", + "tasks": { + "research": "Find best practices for async code", + "implementation": "Refactor the MyClass class", + "testing": "Write unit tests for the refactored code" + } +} +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/25_agent_delegation.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/25_agent_delegation.py) + + +```python icon="python" expandable examples/01_standalone_sdk/25_agent_delegation.py +""" +Agent Delegation Example + +This example demonstrates the agent delegation feature where a main agent +delegates tasks to sub-agents for parallel processing. +Each sub-agent runs independently and returns its results to the main agent, +which then merges both analyses into a single consolidated report. +""" + +import os + +from openhands.sdk import ( + LLM, + Agent, + AgentContext, + Conversation, + Tool, + get_logger, +) +from openhands.sdk.context import Skill +from openhands.sdk.subagent import register_agent +from openhands.sdk.tool import register_tool +from openhands.tools.delegate import ( + DelegateTool, + DelegationVisualizer, +) +from openhands.tools.preset.default import get_default_tools, register_builtins_agents + + +ONLY_RUN_SIMPLE_DELEGATION = False + +logger = get_logger(__name__) + +# Configure LLM and agent +llm = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.environ.get("LLM_BASE_URL", None), + usage_id="agent", +) + +cwd = os.getcwd() + +tools = get_default_tools(enable_browser=True) +tools.append(Tool(name=DelegateTool.name)) +register_builtins_agents() + +main_agent = Agent( + llm=llm, + tools=tools, +) +conversation = Conversation( + agent=main_agent, + workspace=cwd, + visualizer=DelegationVisualizer(name="Delegator"), +) + +conversation.send_message( + "Forget about coding. Let's switch to travel planning. " + "Let's plan a trip to London. I have two issues I need to solve: " + "Lodging: what are the best areas to stay at while keeping budget in mind? " + "Activities: what are the top 5 must-see attractions and hidden gems? " + "Please use the delegation tools to handle these two tasks in parallel. " + "Make sure the sub-agents use their own knowledge " + "and dont rely on internet access. " + "They should keep it short. After getting the results, merge both analyses " + "into a single consolidated report.\n\n" +) +conversation.run() + +conversation.send_message( + "Ask the lodging sub-agent what it thinks about Covent Garden." +) +conversation.run() + +# Report cost for simple delegation example +cost_simple = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST (simple delegation): {cost_simple}") + +print("Simple delegation example done!", "\n" * 20) + +if ONLY_RUN_SIMPLE_DELEGATION: + # For CI: always emit the EXAMPLE_COST marker before exiting. + print(f"EXAMPLE_COST: {cost_simple}") + exit(0) + + +# -------- Agent Delegation Second Part: Built-in Agent Types (Explore + Bash) -------- + +main_agent = Agent( + llm=llm, + tools=[Tool(name=DelegateTool.name)], +) +conversation = Conversation( + agent=main_agent, + workspace=cwd, + visualizer=DelegationVisualizer(name="Delegator (builtins)"), +) + +builtin_task_message = ( + "Demonstrate SDK built-in sub-agent types. " + "1) Spawn an 'explore' sub-agent and ask it to list the markdown files in " + "openhands-sdk/openhands/sdk/subagent/builtins/ and summarize what each " + "built-in agent type is for (based on the file contents). " + "2) Spawn a 'bash' sub-agent and ask it to run `python --version` in the " + "terminal and return the exact output. " + "3) Merge both results into a short report. " + "Do not use internet access." +) + +print("=" * 100) +print("Demonstrating built-in agent delegation (explore + bash)...") +print("=" * 100) + +conversation.send_message(builtin_task_message) +conversation.run() + +# Report cost for builtin agent types example +cost_builtin = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST (builtin agents): {cost_builtin}") + +print("Built-in agent delegation example done!", "\n" * 20) + + +# -------- Agent Delegation Third Part: User-Defined Agent Types -------- + + +def create_lodging_planner(llm: LLM) -> Agent: + """Create a lodging planner focused on London stays.""" + skills = [ + Skill( + name="lodging_planning", + content=( + "You specialize in finding great places to stay in London. " + "Provide 3-4 hotel recommendations with neighborhoods, quick " + "pros/cons, " + "and notes on transit convenience. Keep options varied by budget." + ), + trigger=None, + ) + ] + return Agent( + llm=llm, + tools=[], + agent_context=AgentContext( + skills=skills, + system_message_suffix="Focus only on London lodging recommendations.", + ), + ) + + +def create_activities_planner(llm: LLM) -> Agent: + """Create an activities planner focused on London itineraries.""" + skills = [ + Skill( + name="activities_planning", + content=( + "You design concise London itineraries. Suggest 2-3 daily " + "highlights, grouped by proximity to minimize travel time. " + "Include food/coffee stops " + "and note required tickets/reservations." + ), + trigger=None, + ) + ] + return Agent( + llm=llm, + tools=[], + agent_context=AgentContext( + skills=skills, + system_message_suffix="Plan practical, time-efficient days in London.", + ), + ) + + +# Register user-defined agent types (default agent type is always available) +register_agent( + name="lodging_planner", + factory_func=create_lodging_planner, + description="Finds London lodging options with transit-friendly picks.", +) +register_agent( + name="activities_planner", + factory_func=create_activities_planner, + description="Creates time-efficient London activity itineraries.", +) + +# Make the delegation tool available to the main agent +register_tool("DelegateTool", DelegateTool) + +main_agent = Agent( + llm=llm, + tools=[Tool(name="DelegateTool")], +) +conversation = Conversation( + agent=main_agent, + workspace=cwd, + visualizer=DelegationVisualizer(name="Delegator"), +) + +task_message = ( + "Plan a 3-day London trip. " + "1) Spawn two sub-agents: lodging_planner (hotel options) and " + "activities_planner (itinerary). " + "2) Ask lodging_planner for 3-4 central London hotel recommendations with " + "neighborhoods, quick pros/cons, and transit notes by budget. " + "3) Ask activities_planner for a concise 3-day itinerary with nearby stops, " + " food/coffee suggestions, and any ticket/reservation notes. " + "4) Share both sub-agent results and propose a combined plan." +) + +print("=" * 100) +print("Demonstrating London trip delegation (lodging + activities)...") +print("=" * 100) + +conversation.send_message(task_message) +conversation.run() + +conversation.send_message( + "Ask the lodging sub-agent what it thinks about Covent Garden." +) +conversation.run() + +# Report cost for user-defined agent types example +cost_user_defined = ( + conversation.conversation_stats.get_combined_metrics().accumulated_cost +) +print(f"EXAMPLE_COST (user-defined agents): {cost_user_defined}") + +print("All done!") + +# Full example cost report for CI workflow +print(f"EXAMPLE_COST: {cost_simple + cost_builtin + cost_user_defined}") +``` + + + +### File-Based Agents +Source: https://docs.openhands.dev/sdk/guides/agent-file-based.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +File-based agents let you define specialized sub-agents using Markdown files. Each file declares the agent's name, description, tools, and system prompt — the same things you'd pass to `register_agent()` in code, but without writing any Python. + +This is the fastest way to create reusable, domain-specific agents that can be invoked via [delegation](/sdk/guides/agent-delegation). + +## Agent File Format + +An agent is a single `.md` file with YAML frontmatter and a Markdown body: + +```markdown icon="markdown" +--- +name: code-reviewer +description: > + Reviews code for quality, bugs, and best practices. + Review this pull request for issues + Check this code for bugs +tools: + - file_editor + - terminal +model: inherit +--- + +# Code Reviewer + +You are a meticulous code reviewer. When reviewing code: + +1. **Correctness** - Look for bugs, off-by-one errors, and race conditions. +2. **Style** - Check for consistent naming and idiomatic usage. +3. **Performance** - Identify unnecessary allocations or algorithmic issues. +4. **Security** - Flag injection vulnerabilities or hardcoded secrets. + +Keep feedback concise and actionable. For each issue, suggest a fix. +``` + +The YAML frontmatter configures the agent. The Markdown body becomes the agent's system prompt. + +### Frontmatter Fields + +| Field | Required | Default | Description | +|-------|----------|---------|-------------| +| `name` | Yes | - | Agent identifier (e.g., `code-reviewer`) | +| `description` | No | `""` | What this agent does. Shown to the orchestrator | +| `tools` | No | `[]` | List of tools the agent can use | +| `model` | No | `"inherit"` | LLM model profile to load and use for the subagent (`"inherit"` uses the parent agent's model) | +| `skills` | No | `[]` | List of skill names for this agent (see [Skill Loading Precedence](/overview/skills#skill-loading-precedence) for resolution order). | +| `max_iteration_per_run` | No | `None`| Maximum iterations per run. Must be strictly positive, or `None` for the default value. | +| `color` | No | `None` | [Rich color name](https://rich.readthedocs.io/en/stable/appendix/colors.html) (e.g., `"blue"`, `"green"`) used by visualizers to style this agent's output in terminal panels | + +### `` Tags + +Add `` tags inside the description to help the orchestrating agent know **when** to delegate to this agent: + +```markdown icon="markdown" +description: > + Writes and improves technical documentation. + Write docs for this module + Improve the README +``` + +These examples are extracted and stored as `when_to_use_examples` on the `AgentDefinition` object. They can be used by routing logic (or prompt-building) to help decide when to delegate to the right sub-agent. + +## Directory Conventions + +Place agent files in these directories, scanned in **priority order** (first match wins): + +| Priority | Location | Scope | +|----------|----------|-------| +| 1 | `{project}/.agents/agents/*.md` | Project-level (primary) | +| 2 | `{project}/.openhands/agents/*.md` | Project-level (secondary) | +| 3 | `~/.agents/agents/*.md` | User-level (primary) | +| 4 | `~/.openhands/agents/*.md` | User-level (secondary) | + + + + + + + + + + + + + + + +**Rules:** +- Only top-level `.md` files are loaded (subdirectories are skipped) +- `README.md` files are automatically skipped +- Project-level agents take priority over user-level agents with the same name + + +Put agents shared across all your projects in `~/.agents/agents/`. Put project-specific agents in `{project}/.agents/agents/`. + + +## Built-in Agents + +The `openhands-tools` package ships with built-in sub-agents as Markdown files in `openhands/tools/preset/subagents/`. +They can be registered via `register_builtins_agents()` and become available for delegation tasks. + +By default, all agents include `finish` tool and the `think` tool. + +### Available Built-in Sub-Agents + +| Agent | Tools | Description | +|--------|-------|-------| +| **default** | `terminal`, `file_editor`, `task_tracker`, `browser_tool_set` | General-purpose agent. Used as the fallback when no agent name is specified. | +| **default cli mode** | `terminal`, `file_editor`, `task_tracker` | Same as `default` but without browser tools (used in CLI mode). | +| **explore** | `terminal` | Read-only codebase exploration agent. Finds files, searches code, reads source — never creates or modifies anything. | +| **bash** | `terminal` | Command execution specialist. Runs shell commands, builds, tests, and git operations. | + +In CLI mode, the `default` agent (with browser tools) is replaced by the `default cli mode` agent. In non-CLI mode, `default cli mode` is filtered out. + +### Registering Built-in Sub-Agents + +Call `register_builtins_agents()` to register all built-in sub-agents. This is typically done once before creating a conversation: + +```python icon="python" focus={3-4, 6-7} +from openhands.tools.preset.default import register_builtins_agents + +# Register built-in sub-agents (default, explore, bash) +register_builtins_agents() + +# Or in CLI mode (swaps default for default cli mode — no browser) +register_builtins_agents(cli_mode=True) +``` + + +Registration order is critical when programmatically registering agents that share a name with a built-in agent. The system is designed to skip registration if a name is already taken. Therefore, if you register your custom agents before the built-in agents are loaded, your custom versions will take precedence. + +Conversely, if the built-in agents are loaded first, they will take precedence, and any subsequent registration of a custom agent with the same name will be ignored. + + + +## Overall Priority + +When the same agent name is defined in multiple places, the highest-priority source wins. Registration is first-come first-win. + +| Priority | Source | Description | +|----------|--------|-------------| +| 1 (highest) | **Programmatic** `register_agent()` | Registered first, never overwritten | +| 2 | **Plugin agents** (`Plugin.agents`) | Loaded from plugin `agents/` directories | +| 3 | **Project-level** file-based agents | `.agents/agents/*.md` or `.openhands/agents/*.md` | +| 4 (lowest) | **User-level** file-based agents | `~/.agents/agents/*.md` or `~/.openhands/agents/*.md` | + +## Auto-Registration + +The simplest way to use file-based agents is auto-registration. Call `register_file_agents()` with your project directory, and all discovered agents are registered into the delegation system: + +```python icon="python" focus={3} +from openhands.sdk.subagent import register_file_agents + +agent_names = register_file_agents("/path/to/project") +print(f"Registered {len(agent_names)} agents: {agent_names}") +``` + +This scans both project-level and user-level directories, deduplicates by name, and registers each agent as a delegate that can be spawned by the orchestrator. + +## Manual Loading + +For more control, load and register agents explicitly: + +```python icon="python" focus={3-6, 8-14} +from pathlib import Path + +from openhands.sdk import load_agents_from_dir, register_agent, agent_definition_to_factory + +# Load from a specific directory +agents_dir = Path("agents") +agent_definitions = load_agents_from_dir(agents_dir) + +# Register each agent +for agent_def in agent_definitions: + register_agent( + name=agent_def.name, + factory_func=agent_definition_to_factory(agent_def), + description=agent_def.description, + ) +``` + +### Key Functions + +#### `load_agents_from_dir()` + +Scans a directory for `.md` files and returns a list of `AgentDefinition` objects: + +```python icon="python" focus={3-4} +from pathlib import Path + +from openhands.sdk import load_agents_from_dir + +definitions = load_agents_from_dir(Path(".agents/agents")) +for d in definitions: + print(f"{d.name}: {d.tools}, model={d.model}") +``` + +#### `agent_definition_to_factory()` + +Converts an `AgentDefinition` into a factory function `(LLM) -> Agent`: + +```python icon="python" +from openhands.sdk import agent_definition_to_factory + +factory = agent_definition_to_factory(agent_def) +# The factory is called by the delegation system with the parent's LLM +``` + +The factory: +- Maps tool names from the frontmatter to `Tool` objects +- Appends the Markdown body to the parent system message via `AgentContext(system_message_suffix=...)` +- Respects the `model` field (`"inherit"` keeps the parent LLM; an explicit model name creates a copy) + +#### `load_project_agents()` / `load_user_agents()` + +Load agents from project-level or user-level directories respectively: + +```python icon="python" focus={3, 4} +from openhands.sdk.subagent import load_project_agents, load_user_agents + +project_agents = load_project_agents("/path/to/project") +user_agents = load_user_agents() # scans ~/.agents/agents/ and ~/.openhands/agents/ +``` + +## Using with Delegation + +File-based agents are designed to work with the [DelegateTool](/sdk/guides/agent-delegation). Once registered, the orchestrating agent can spawn and delegate tasks to them by name: + +```python icon="python" focus={6, 9-12, 15-19} +from openhands.sdk import Agent, Conversation, Tool +from openhands.sdk.subagent import register_file_agents +from openhands.sdk.tool import register_tool +from openhands.tools.delegate import DelegateTool, DelegationVisualizer + +register_file_agents("/path/to/project") # Register .agents/agents/*.md + +# Set up the orchestrator with DelegateTool +register_tool("DelegateTool", DelegateTool) +main_agent = Agent( + llm=llm, + tools=[Tool(name="DelegateTool")], +) + +conversation = Conversation( + agent=main_agent, + workspace="/path/to/project", + visualizer=DelegationVisualizer(name="Orchestrator"), +) +``` + +To learn more about agent delegation, follow our [comprehensive guide](/sdk/guides/agent-delegation). + +## Example Agent Files + +### Code Reviewer + +```markdown icon="markdown" +--- +name: code-reviewer +description: > + Reviews code for quality, bugs, and best practices. + Review this pull request for issues + Check this code for bugs +tools: + - file_editor + - terminal +--- + +# Code Reviewer + +You are a meticulous code reviewer. When reviewing code: + +1. **Correctness** - Look for bugs, off-by-one errors, null pointer issues, and race conditions. +2. **Style** - Check for consistent naming, formatting, and idiomatic usage. +3. **Performance** - Identify unnecessary allocations, N+1 queries, or algorithmic inefficiencies. +4. **Security** - Flag potential injection vulnerabilities, hardcoded secrets, or unsafe deserialization. + +Keep feedback concise and actionable. For each issue found, suggest a concrete fix. +``` + +### Technical Writer + +```markdown icon="markdown" +--- +name: tech-writer +description: > + Writes and improves technical documentation. + Write docs for this module + Improve the README +tools: + - file_editor +--- + +# Technical Writer + +You are a skilled technical writer. When creating or improving documentation: + +1. **Audience** - Write for developers who are new to the project. +2. **Structure** - Use clear headings, code examples, and step-by-step instructions. +3. **Accuracy** - Read the source code before documenting behavior. Never guess. +4. **Brevity** - Prefer short, concrete sentences over long explanations. + +Always include a usage example with expected output when documenting functions or APIs. +``` + +## Agents in Plugins + +> Plugins bundle agents, tools, skills, and MCP servers into reusable packages. +Learn more about plugins [here](/sdk/guides/plugins). + +File-based agents can also be bundled inside plugins. Place them in the `agents/` directory of your plugin: + + + + + + + + + + + + + +Plugin agents use the same `.md` format and are registered automatically when the plugin is loaded. They have higher priority than file-based agents but lower than programmatic `register_agent()` calls. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/42_file_based_subagents.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/42_file_based_subagents.py) + + +This example uses `AgentDefinition` directly. File-based agents are loaded into the same `AgentDefinition` objects (from Markdown) and registered the same way. + +```python icon="python" expandable examples/01_standalone_sdk/42_file_based_subagents.py +"""Example: Defining a sub-agent inline with AgentDefinition. + +Defines a grammar-checker sub-agent using AgentDefinition, registers it, +and delegates work to it from an orchestrator agent. The orchestrator then +asks the builtin default agent to judge the results. +""" + +import os +from pathlib import Path + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Tool, + agent_definition_to_factory, + register_agent, +) +from openhands.sdk.subagent import AgentDefinition +from openhands.sdk.tool import register_tool +from openhands.tools.delegate import DelegateTool, DelegationVisualizer + + +# 1. Define a sub-agent using AgentDefinition +grammar_checker = AgentDefinition( + name="grammar-checker", + description="Checks documents for grammatical errors.", + tools=["file_editor"], + system_prompt="You are a grammar expert. Find and list grammatical errors.", +) + +# 2. Register it in the delegate registry +register_agent( + name=grammar_checker.name, + factory_func=agent_definition_to_factory(grammar_checker), + description=grammar_checker.description, +) + +# 3. Set up the orchestrator agent with the DelegateTool +llm = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL"), + usage_id="file-agents-demo", +) + +register_tool("DelegateTool", DelegateTool) +main_agent = Agent( + llm=llm, + tools=[Tool(name="DelegateTool")], +) +conversation = Conversation( + agent=main_agent, + workspace=Path.cwd(), + visualizer=DelegationVisualizer(name="Orchestrator"), +) + +# 4. Ask the orchestrator to delegate to our agent +task = ( + "Please delegate to the grammar-checker agent and ask it to review " + "the README.md file in search of grammatical errors.\n" + "Then ask the default agent to judge the errors." +) +conversation.send_message(task) +conversation.run() + +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"\nTotal cost: ${cost:.4f}") +print(f"EXAMPLE_COST: {cost:.4f}") +``` + + + +## Next Steps + +- **[Sub-Agent Delegation](/sdk/guides/agent-delegation)** - Learn about the DelegateTool and delegation patterns +- **[Skills](/sdk/guides/skill)** - Add specialized knowledge and triggers to agents +- **[Plugins](/sdk/guides/plugins)** - Bundle agents, skills, hooks, and MCP servers together +- **[Custom Agent](/sdk/guides/agent-custom)** - Create agents programmatically for more control + +### Interactive Terminal +Source: https://docs.openhands.dev/sdk/guides/agent-interactive-terminal.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The `BashTool` provides agents with the ability to interact with terminal applications that require back-and-forth communication, such as Python's interactive mode, ipython, database CLIs, and other REPL environments. This enables agents to execute commands within these interactive sessions, receive output, and send follow-up commands based on the results. + + +## How It Works + +```python icon="python" focus={4-7} +cwd = os.getcwd() +register_tool("BashTool", BashTool) +tools = [ + Tool( + name="BashTool", + params={"no_change_timeout_seconds": 3}, + ) +] +``` + + +The `BashTool` is configured with a `no_change_timeout_seconds` parameter that determines how long to wait for terminal updates before sending the output back to the agent. + +In the example above, the agent should: +1. Enters Python's interactive mode by running `python3` +2. Executes Python code to get the current time +3. Exits the Python interpreter + +The `BashTool` maintains the session state throughout these interactions, allowing the agent to send multiple commands within the same terminal session. Review the [BashTool](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/terminal/definition.py) and [terminal source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/terminal/terminal/terminal_session.py) to better understand how the interactive session is configured and managed. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py) + + + +```python icon="python" expandable examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + params={"no_change_timeout_seconds": 3}, + ) +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message( + "Enter python interactive mode by directly running `python3`, then tell me " + "the current time, and exit python interactive mode." +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create your own tools for specific use cases + +### API-based Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/api-sandbox.md + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +The API-sandboxed agent server demonstrates how to use `APIRemoteWorkspace` to connect to a [OpenHands runtime API service](https://runtime.all-hands.dev/). This eliminates the need to manage your own infrastructure, providing automatic scaling, monitoring, and secure sandboxed execution. + +## Key Concepts + +### APIRemoteWorkspace + +The `APIRemoteWorkspace` connects to a hosted runtime API service: + +```python icon="python" +with APIRemoteWorkspace( + runtime_api_url="https://runtime.eval.all-hands.dev", + runtime_api_key=runtime_api_key, + server_image="ghcr.io/openhands/agent-server:main-python", +) as workspace: +``` + +This workspace type: +- Connects to a remote runtime API service +- Automatically provisions sandboxed environments +- Manages container lifecycle through the API +- Handles all infrastructure concerns + +### Runtime API Authentication + +The example requires a runtime API key for authentication: + +```python icon="python" +runtime_api_key = os.getenv("RUNTIME_API_KEY") +if not runtime_api_key: + logger.error("RUNTIME_API_KEY required") + exit(1) +``` + +This key authenticates your requests to the hosted runtime service. + +### Pre-built Image Selection + +You can specify which pre-built agent server image to use: + +```python icon="python" focus={4} +APIRemoteWorkspace( + runtime_api_url="https://runtime.eval.all-hands.dev", + runtime_api_key=runtime_api_key, + server_image="ghcr.io/openhands/agent-server:main-python", +) +``` + +The runtime API will pull and run the specified image in a sandboxed environment. + +### Workspace Testing + +Just like with `DockerWorkspace`, you can test the workspace before running the agent: + +```python icon="python" focus={1-3} +result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" +) +logger.info(f"Command completed: {result.exit_code}, {result.stdout}") +``` + +This verifies connectivity to the remote runtime and ensures the environment is ready. + +### Automatic RemoteConversation + +The conversation uses WebSocket communication with the remote server: + +```python icon="python" focus={1, 7} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True +) +assert isinstance(conversation, RemoteConversation) +``` + +All agent execution happens on the remote runtime infrastructure. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py) + + +This example shows how to connect to a hosted runtime API for fully managed agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py +"""Example: APIRemoteWorkspace with Dynamic Build. + +This example demonstrates building an agent-server image on-the-fly from the SDK +codebase and launching it in a remote sandboxed environment via Runtime API. + +Usage: + uv run examples/24_remote_convo_with_api_sandboxed_server.py + +Requirements: + - LLM_API_KEY: API key for LLM access + - RUNTIME_API_KEY: API key for runtime API access +""" + +import os +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import APIRemoteWorkspace + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") +assert api_key, "LLM_API_KEY required" + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + +runtime_api_key = os.getenv("RUNTIME_API_KEY") +if not runtime_api_key: + logger.error("RUNTIME_API_KEY required") + exit(1) + + +# If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency +# Otherwise, use the latest image from main +server_image_sha = os.getenv("GITHUB_SHA") or "main" +server_image = f"ghcr.io/openhands/agent-server:{server_image_sha[:7]}-python-amd64" +logger.info(f"Using server image: {server_image}") + +with APIRemoteWorkspace( + runtime_api_url=os.getenv("RUNTIME_API_URL", "https://runtime.eval.all-hands.dev"), + runtime_api_key=runtime_api_key, + server_image=server_image, + image_pull_policy="Always", +) as workspace: + agent = get_default_agent(llm=llm, cli_mode=True) + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + received_events.append(event) + last_event_time["ts"] = time.time() + + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info(f"Command completed: {result.exit_code}, {result.stdout}") + + conversation = Conversation( + agent=agent, workspace=workspace, callbacks=[event_callback] + ) + assert isinstance(conversation, RemoteConversation) + + try: + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + conversation.run() + + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + + conversation.send_message("Great! Now delete that file.") + conversation.run() + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + conversation.close() +``` + +You can run the example code as-is. + +```bash Running the Example +export LLM_API_KEY="your-api-key" +# If using the OpenHands LLM proxy, set its base URL: +export LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" +export RUNTIME_API_KEY="your-runtime-api-key" +# Set the runtime API URL for the remote sandbox +export RUNTIME_API_URL="https://runtime.eval.all-hands.dev" +cd agent-sdk +uv run python examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py +``` + +## Next Steps + +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture + +### Apptainer Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/apptainer-sandbox.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#basic-apptainer-sandbox-example)! + +The Apptainer sandboxed agent server demonstrates how to run agents in isolated Apptainer containers using ApptainerWorkspace. + +Apptainer (formerly Singularity) is a container runtime designed for HPC environments that doesn't require root access, making it ideal for shared computing environments, university clusters, and systems where Docker is not available. + +## When to Use Apptainer + +Use Apptainer instead of Docker when: +- Running on HPC clusters or shared computing environments +- Root access is not available +- Docker daemon cannot be installed +- Working in academic or research computing environments +- Security policies restrict Docker usage + +## Prerequisites + +Before running this example, ensure you have: +- Apptainer installed ([Installation Guide](https://apptainer.org/docs/user/main/quick_start.html)) +- LLM API key set in environment + +## Basic Apptainer Sandbox Example + + +This example is available on GitHub: [examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py) + + +This example shows how to create an `ApptainerWorkspace` that automatically manages Apptainer containers for agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py +import os +import platform +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import ApptainerWorkspace + + +logger = get_logger(__name__) + +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" + + +# 2) Create an Apptainer-based remote workspace that will set up and manage +# the Apptainer container automatically. Use `ApptainerWorkspace` with a +# pre-built agent server image. +# Apptainer (formerly Singularity) doesn't require root access, making it +# ideal for HPC and shared computing environments. +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with ApptainerWorkspace( + # use pre-built image for faster startup + server_image=server_image, + host_port=8010, + platform=detect_platform(), +) as workspace: + # 3) Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # 4) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # 5) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + + logger.info("📝 Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + logger.info("🚀 Running conversation...") + conversation.run() + logger.info("✅ First task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") + + # Wait for events to settle (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") + + logger.info("🚀 Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("✅ Second task completed!") + + # Report cost (must be before conversation.close()) + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` + + + +## Configuration Options + +The `ApptainerWorkspace` supports several configuration options: + +### Option 1: Pre-built Image (Recommended) + +Use a pre-built agent server image for fastest startup: + +```python icon="python" focus={2} +with ApptainerWorkspace( + server_image="ghcr.io/openhands/agent-server:main-python", + host_port=8010, +) as workspace: + # Your code here +``` + +### Option 2: Build from Base Image + +Build from a base image when you need custom dependencies: + +```python icon="python" focus={2} +with ApptainerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, +) as workspace: + # Your code here +``` + + +Building from a base image requires internet access and may take several minutes on first run. The built image is cached for subsequent runs. + + +### Option 3: Use Existing SIF File + +If you have a pre-built Apptainer SIF file: + +```python icon="python" focus={2} +with ApptainerWorkspace( + sif_file="/path/to/your/agent-server.sif", + host_port=8010, +) as workspace: + # Your code here +``` + +## Key Features + +### Rootless Container Execution + +Apptainer runs completely without root privileges: +- No daemon process required +- User namespace isolation +- Compatible with most HPC security policies + +### Image Caching + +Apptainer automatically caches container images: +- First run builds/pulls the image +- Subsequent runs reuse cached SIF files +- Cache location: `~/.cache/apptainer/` + +### Port Mapping + +The workspace exposes ports for agent services: +```python icon="python" focus={1, 3} +with ApptainerWorkspace( + server_image="ghcr.io/openhands/agent-server:main-python", + host_port=8010, # Maps to container port 8010 +) as workspace: + # Access agent server at http://localhost:8010 +``` + +## Differences from Docker + +While the API is similar to DockerWorkspace, there are some differences: + +| Feature | Docker | Apptainer | +|---------|--------|-----------| +| Root access required | Yes (daemon) | No | +| Installation | Requires Docker Engine | Single binary | +| Image format | OCI/Docker | SIF | +| Build speed | Fast (layers) | Slower (monolithic) | +| HPC compatibility | Limited | Excellent | +| Networking | Bridge/overlay | Host networking | + +## Troubleshooting + +### Apptainer Not Found + +If you see `apptainer: command not found`: +1. Install Apptainer following the [official guide](https://apptainer.org/docs/user/main/quick_start.html) +2. Ensure it's in your PATH: `which apptainer` + +### Permission Errors + +Apptainer should work without root. If you see permission errors: +- Check that your user has access to `/tmp` +- Verify Apptainer is properly installed: `apptainer version` +- Ensure the cache directory is writable: `ls -la ~/.cache/apptainer/` + +## Next Steps + +- **[Docker Sandbox](/sdk/guides/agent-server/docker-sandbox)** - Alternative container runtime +- **[API Sandbox](/sdk/guides/agent-server/api-sandbox)** - Remote API-based sandboxing +- **[Local Server](/sdk/guides/agent-server/local-server)** - Non-sandboxed local execution + +### OpenHands Cloud Workspace +Source: https://docs.openhands.dev/sdk/guides/agent-server/cloud-workspace.md + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The `OpenHandsCloudWorkspace` demonstrates how to use the [OpenHands Cloud](https://app.all-hands.dev) to provision and manage sandboxed environments for agent execution. This provides a seamless experience with automatic sandbox provisioning, monitoring, and secure execution without managing your own infrastructure. + +## Key Concepts + +### OpenHandsCloudWorkspace + +The `OpenHandsCloudWorkspace` connects to OpenHands Cloud to provision sandboxes: + +```python icon="python" focus={1-2} +with OpenHandsCloudWorkspace( + cloud_api_url="https://app.all-hands.dev", + cloud_api_key=cloud_api_key, +) as workspace: +``` + +This workspace type: +- Connects to OpenHands Cloud API +- Automatically provisions sandboxed environments +- Manages sandbox lifecycle (create, poll status, delete) +- Handles all infrastructure concerns + +### Getting Your API Key + +To use OpenHands Cloud, you need an API key: + +1. Go to [app.all-hands.dev](https://app.all-hands.dev) +2. Sign in to your account +3. Navigate to Settings → API Keys +4. Create a new API key + +Store this key securely and use it as the `OPENHANDS_CLOUD_API_KEY` environment variable. + + +### Configuration Options + +The `OpenHandsCloudWorkspace` supports several configuration options: + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `cloud_api_url` | `str` | Required | OpenHands Cloud API URL | +| `cloud_api_key` | `str` | Required | API key for authentication | +| `sandbox_spec_id` | `str \| None` | `None` | Custom sandbox specification ID | +| `init_timeout` | `float` | `300.0` | Timeout for sandbox initialization (seconds) | +| `api_timeout` | `float` | `60.0` | Timeout for API requests (seconds) | +| `keep_alive` | `bool` | `False` | Keep sandbox running after cleanup | + +### Keep Alive Mode + +By default, the sandbox is deleted when the workspace is closed. To keep it running: + +```python icon="python" focus={4} +workspace = OpenHandsCloudWorkspace( + cloud_api_url="https://app.all-hands.dev", + cloud_api_key=cloud_api_key, + keep_alive=True, +) +``` + +This is useful for debugging or when you want to inspect the sandbox state after execution. + +### Workspace Testing + +You can test the workspace before running the agent: + +```python icon="python" focus={1-3} +result = workspace.execute_command( + "echo 'Hello from OpenHands Cloud sandbox!' && pwd" +) +logger.info(f"Command completed: {result.exit_code}, {result.stdout}") +``` + +This verifies connectivity to the cloud sandbox and ensures the environment is ready. + +## Comparison with Other Workspace Types + +| Feature | OpenHandsCloudWorkspace | APIRemoteWorkspace | DockerWorkspace | +|---------|------------------------|-------------------|-----------------| +| Infrastructure | OpenHands Cloud | Runtime API | Local Docker | +| Authentication | API Key | API Key | None | +| Setup Required | None | Runtime API access | Docker installed | +| Custom Images | Via sandbox specs | Direct image specification | Direct image specification | +| Best For | Production use | Custom runtime environments | Local development | + +## Ready-to-run Example + + +This example is available on GitHub: [examples/02_remote_agent_server/07_convo_with_cloud_workspace.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/07_convo_with_cloud_workspace.py) + + +This example shows how to connect to OpenHands Cloud for fully managed agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/07_convo_with_cloud_workspace.py +"""Example: OpenHandsCloudWorkspace for OpenHands Cloud API. + +This example demonstrates using OpenHandsCloudWorkspace to provision a sandbox +via OpenHands Cloud (app.all-hands.dev) and run an agent conversation. + +Usage: + uv run examples/02_remote_agent_server/06_convo_with_cloud_workspace.py + +Requirements: + - LLM_API_KEY: API key for direct LLM provider access (e.g., Anthropic API key) + - OPENHANDS_CLOUD_API_KEY: API key for OpenHands Cloud access + +Note: + The LLM configuration is sent to the cloud sandbox, so you need an API key + that works directly with the LLM provider (not a local proxy). If using + Anthropic, set LLM_API_KEY to your Anthropic API key. +""" + +import os +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import OpenHandsCloudWorkspace + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") +assert api_key, "LLM_API_KEY required" + +# Note: Don't use a local proxy URL here - the cloud sandbox needs direct access +# to the LLM provider. Use None for base_url to let LiteLLM use the default +# provider endpoint, or specify the provider's direct URL. +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL") or None, + api_key=SecretStr(api_key), +) + +cloud_api_key = os.getenv("OPENHANDS_CLOUD_API_KEY") +if not cloud_api_key: + logger.error("OPENHANDS_CLOUD_API_KEY required") + exit(1) + +cloud_api_url = os.getenv("OPENHANDS_CLOUD_API_URL", "https://app.all-hands.dev") +logger.info(f"Using OpenHands Cloud API: {cloud_api_url}") + +with OpenHandsCloudWorkspace( + cloud_api_url=cloud_api_url, + cloud_api_key=cloud_api_key, +) as workspace: + agent = get_default_agent(llm=llm, cli_mode=True) + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + received_events.append(event) + last_event_time["ts"] = time.time() + + result = workspace.execute_command( + "echo 'Hello from OpenHands Cloud sandbox!' && pwd" + ) + logger.info(f"Command completed: {result.exit_code}, {result.stdout}") + + conversation = Conversation( + agent=agent, workspace=workspace, callbacks=[event_callback] + ) + assert isinstance(conversation, RemoteConversation) + + try: + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + conversation.run() + + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + + conversation.send_message("Great! Now delete that file.") + conversation.run() + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + conversation.close() + + logger.info("✅ Conversation completed successfully.") + logger.info(f"Total {len(received_events)} events received during conversation.") +``` + + +```bash Running the Example +export LLM_API_KEY="your-llm-api-key" +export OPENHANDS_CLOUD_API_KEY="your-cloud-api-key" +# Optional: specify a custom sandbox spec +# export OPENHANDS_SANDBOX_SPEC_ID="your-sandbox-spec-id" +cd agent-sdk +uv run python examples/02_remote_agent_server/07_convo_with_cloud_workspace.py +``` + +## Next Steps + +- **[API-based Sandbox](/sdk/guides/agent-server/api-sandbox)** - Connect to Runtime API service +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run locally with Docker +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** - Development without containers +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details + +### Custom Tools with Remote Agent Server +Source: https://docs.openhands.dev/sdk/guides/agent-server/custom-tools.md + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +When using a [remote agent server](/sdk/guides/agent-server/overview), custom tools must be available in the server's Python environment. This guide shows how to build a custom base image with your tools and use `DockerDevWorkspace` to automatically build the agent server on top of it. + + +For standalone custom tools (without remote agent server), see the [Custom Tools guide](/sdk/guides/custom-tools). + + +## How It Works + +1. **Define custom tool** with `register_tool()` at module level +2. **Create Dockerfile** that copies tools and sets `PYTHONPATH` +3. **Build custom base image** with your tools +4. **Use `DockerDevWorkspace`** with `base_image` parameter - it builds the agent server on top +5. **Import tool module** in client before creating conversation +6. **Server imports modules** dynamically, triggering registration + +## Key Files + +### Custom Tool (`custom_tools/log_data.py`) + +```python icon="python" expandable examples/02_remote_agent_server/06_custom_tool/custom_tools/log_data.py +"""Log Data Tool - Example custom tool for logging structured data to JSON. + +This tool demonstrates how to create a custom tool that logs structured data +to a local JSON file during agent execution. The data can be retrieved and +verified after the agent completes. +""" + +import json +from collections.abc import Sequence +from datetime import UTC, datetime +from enum import Enum +from pathlib import Path +from typing import Any + +from pydantic import Field + +from openhands.sdk import ( + Action, + ImageContent, + Observation, + TextContent, + ToolDefinition, +) +from openhands.sdk.tool import ToolExecutor, register_tool + + +# --- Enums and Models --- + + +class LogLevel(str, Enum): + """Log level for entries.""" + + DEBUG = "debug" + INFO = "info" + WARNING = "warning" + ERROR = "error" + + +class LogDataAction(Action): + """Action to log structured data to a JSON file.""" + + message: str = Field(description="The log message") + level: LogLevel = Field( + default=LogLevel.INFO, + description="Log level (debug, info, warning, error)", + ) + data: dict[str, Any] = Field( + default_factory=dict, + description="Additional structured data to include in the log entry", + ) + + +class LogDataObservation(Observation): + """Observation returned after logging data.""" + + success: bool = Field(description="Whether the data was successfully logged") + log_file: str = Field(description="Path to the log file") + entry_count: int = Field(description="Total number of entries in the log file") + + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + """Convert observation to LLM content.""" + if self.success: + return [ + TextContent( + text=( + f"✅ Data logged successfully to {self.log_file}\n" + f"Total entries: {self.entry_count}" + ) + ) + ] + return [TextContent(text="❌ Failed to log data")] + + +# --- Executor --- + +# Default log file path +DEFAULT_LOG_FILE = "/tmp/agent_data.json" + + +class LogDataExecutor(ToolExecutor[LogDataAction, LogDataObservation]): + """Executor that logs structured data to a JSON file.""" + + def __init__(self, log_file: str = DEFAULT_LOG_FILE): + """Initialize the log data executor. + + Args: + log_file: Path to the JSON log file + """ + self.log_file = Path(log_file) + + def __call__( + self, + action: LogDataAction, + conversation=None, # noqa: ARG002 + ) -> LogDataObservation: + """Execute the log data action. + + Args: + action: The log data action + conversation: Optional conversation context (not used) + + Returns: + LogDataObservation with the result + """ + # Load existing entries or start fresh + entries: list[dict[str, Any]] = [] + if self.log_file.exists(): + try: + with open(self.log_file) as f: + entries = json.load(f) + except (json.JSONDecodeError, OSError): + entries = [] + + # Create new entry with timestamp + entry = { + "timestamp": datetime.now(UTC).isoformat(), + "level": action.level.value, + "message": action.message, + "data": action.data, + } + entries.append(entry) + + # Write back to file + self.log_file.parent.mkdir(parents=True, exist_ok=True) + with open(self.log_file, "w") as f: + json.dump(entries, f, indent=2) + + return LogDataObservation( + success=True, + log_file=str(self.log_file), + entry_count=len(entries), + ) + + +# --- Tool Definition --- + +_LOG_DATA_DESCRIPTION = """Log structured data to a JSON file. + +Use this tool to record information, findings, or events during your work. +Each log entry includes a timestamp and can contain arbitrary structured data. + +Parameters: +* message: A descriptive message for the log entry +* level: Log level - one of 'debug', 'info', 'warning', 'error' (default: info) +* data: Optional dictionary of additional structured data to include + +Example usage: +- Log a finding: message="Found potential issue", level="warning", data={"file": "app.py", "line": 42} +- Log progress: message="Completed analysis", level="info", data={"files_checked": 10} +""" # noqa: E501 + + +class LogDataTool(ToolDefinition[LogDataAction, LogDataObservation]): + """Tool for logging structured data to a JSON file.""" + + @classmethod + def create(cls, conv_state, **params) -> Sequence[ToolDefinition]: # noqa: ARG003 + """Create LogDataTool instance. + + Args: + conv_state: Conversation state (not used in this example) + **params: Additional parameters: + - log_file: Path to the JSON log file (default: /tmp/agent_data.json) + + Returns: + A sequence containing a single LogDataTool instance + """ + log_file = params.get("log_file", DEFAULT_LOG_FILE) + executor = LogDataExecutor(log_file=log_file) + + return [ + cls( + description=_LOG_DATA_DESCRIPTION, + action_type=LogDataAction, + observation_type=LogDataObservation, + executor=executor, + ) + ] + + +# Auto-register the tool when this module is imported +# This is what enables dynamic tool registration in the remote agent server +register_tool("LogDataTool", LogDataTool) +``` + +### Dockerfile + +```dockerfile icon="docker" +FROM nikolaik/python-nodejs:python3.12-nodejs22 + +COPY custom_tools /app/custom_tools +ENV PYTHONPATH="/app:${PYTHONPATH}" +``` + +## Troubleshooting + +| Issue | Solution | +|-------|----------| +| Tool not found | Ensure `register_tool()` is called at module level, import tool before creating conversation | +| Import errors on server | Check `PYTHONPATH` in Dockerfile, verify all dependencies installed | +| Build failures | Verify file paths in `COPY` commands, ensure Python 3.12+ | + + +**Binary Mode Limitation**: Custom tools only work with **source mode** deployments. When using `DockerDevWorkspace`, set `target="source"` (the default). See [GitHub issue #1531](https://github.com/OpenHands/software-agent-sdk/issues/1531) for details. + + +## Ready-to-run Example + + +This example is available on GitHub: [examples/02_remote_agent_server/06_custom_tool/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/02_remote_agent_server/06_custom_tool) + + +```python icon="python" expandable examples/02_remote_agent_server/06_custom_tool/custom_tool_example.py +"""Example: Using custom tools with remote agent server. + +This example demonstrates how to use custom tools with a remote agent server +by building a custom base image that includes the tool implementation. + +Prerequisites: + 1. Build the custom base image first: + cd examples/02_remote_agent_server/05_custom_tool + ./build_custom_image.sh + + 2. Set LLM_API_KEY environment variable + +The workflow is: +1. Define a custom tool (LogDataTool for logging structured data to JSON) +2. Create a simple Dockerfile that copies the tool into the base image +3. Build the custom base image +4. Use DockerDevWorkspace with base_image pointing to the custom image +5. DockerDevWorkspace builds the agent server on top of the custom base image +6. The server dynamically registers tools when the client creates a conversation +7. The agent can use the custom tool during execution +8. Verify the logged data by reading the JSON file from the workspace + +This pattern is useful for: +- Collecting structured data during agent runs (logs, metrics, events) +- Implementing custom integrations with external systems +- Adding domain-specific operations to the agent +""" + +import os +import platform +import subprocess +import sys +import time +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + Tool, + get_logger, +) +from openhands.workspace import DockerDevWorkspace + + +logger = get_logger(__name__) + +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +# Get the directory containing this script +example_dir = Path(__file__).parent.absolute() + +# Custom base image tag (contains custom tools, agent server built on top) +CUSTOM_BASE_IMAGE_TAG = "custom-base-image:latest" + +# 2) Check if custom base image exists, build if not +logger.info(f"🔍 Checking for custom base image: {CUSTOM_BASE_IMAGE_TAG}") +result = subprocess.run( + ["docker", "images", "-q", CUSTOM_BASE_IMAGE_TAG], + capture_output=True, + text=True, + check=False, +) + +if not result.stdout.strip(): + logger.info("⚠️ Custom base image not found. Building...") + logger.info("📦 Building custom base image with custom tools...") + build_script = example_dir / "build_custom_image.sh" + try: + subprocess.run( + [str(build_script), CUSTOM_BASE_IMAGE_TAG], + cwd=str(example_dir), + check=True, + ) + logger.info("✅ Custom base image built successfully!") + except subprocess.CalledProcessError as e: + logger.error(f"❌ Failed to build custom base image: {e}") + logger.error("Please run ./build_custom_image.sh manually and fix any errors.") + sys.exit(1) +else: + logger.info(f"✅ Custom base image found: {CUSTOM_BASE_IMAGE_TAG}") + +# 3) Create a DockerDevWorkspace with the custom base image +# DockerDevWorkspace will build the agent server on top of this base image +logger.info("🚀 Building and starting agent server with custom tools...") +logger.info("📦 This may take a few minutes on first run...") + +with DockerDevWorkspace( + base_image=CUSTOM_BASE_IMAGE_TAG, + host_port=8011, + platform=detect_platform(), + target="source", # NOTE: "binary" target does not work with custom tools +) as workspace: + logger.info("✅ Custom agent server started!") + + # 4) Import custom tools to register them in the client's registry + # This allows the client to send the module qualname to the server + # The server will then import the same module and execute the tool + import custom_tools.log_data # noqa: F401 + + # 5) Create agent with custom tools + # Note: We specify the tool here, but it's actually executed on the server + # Get default tools and add our custom tool + from openhands.sdk import Agent + from openhands.tools.preset.default import get_default_condenser, get_default_tools + + tools = get_default_tools(enable_browser=False) + # Add our custom tool! + tools.append(Tool(name="LogDataTool")) + + agent = Agent( + llm=llm, + tools=tools, + system_prompt_kwargs={"cli_mode": True}, + condenser=get_default_condenser( + llm=llm.model_copy(update={"usage_id": "condenser"}) + ), + ) + + # 6) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # 7) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Custom agent server ready!' && python --version" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + + # 8) Create conversation with the custom agent + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + + logger.info("📝 Sending task to analyze files and log findings...") + conversation.send_message( + "Please analyze the Python files in the current directory. " + "Use the LogDataTool to log your findings as you work. " + "For example:\n" + "- Log when you start analyzing a file (level: info)\n" + "- Log any interesting patterns you find (level: info)\n" + "- Log any potential issues (level: warning)\n" + "- Include relevant data like file names, line numbers, etc.\n\n" + "Make at least 3 log entries using the LogDataTool." + ) + logger.info("🚀 Running conversation...") + conversation.run() + logger.info("✅ Task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") + + # Wait for events to settle (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") + + # 9) Read the logged data from the JSON file using file_download API + logger.info("\n📊 Logged Data Summary:") + logger.info("=" * 80) + + # Download the log file from the workspace using the file download API + import json + import tempfile + + with tempfile.NamedTemporaryFile( + mode="w", suffix=".json", delete=False + ) as tmp_file: + local_path = tmp_file.name + + download_result = workspace.file_download( + source_path="/tmp/agent_data.json", + destination_path=local_path, + ) + + if download_result.success: + try: + with open(local_path) as f: + log_entries = json.load(f) + logger.info(f"Found {len(log_entries)} log entries:\n") + for i, entry in enumerate(log_entries, 1): + logger.info(f"Entry {i}:") + logger.info(f" Timestamp: {entry.get('timestamp', 'N/A')}") + logger.info(f" Level: {entry.get('level', 'N/A')}") + logger.info(f" Message: {entry.get('message', 'N/A')}") + if entry.get("data"): + logger.info(f" Data: {json.dumps(entry['data'], indent=4)}") + logger.info("") + except json.JSONDecodeError: + logger.info("Log file exists but couldn't parse JSON") + with open(local_path) as f: + logger.info(f"Raw content: {f.read()}") + finally: + # Clean up the temporary file + Path(local_path).unlink(missing_ok=True) + else: + logger.info("No log file found (agent may not have used the tool)") + if download_result.error: + logger.debug(f"Download error: {download_result.error}") + + logger.info("=" * 80) + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") + + finally: + logger.info("\n🧹 Cleaning up conversation...") + conversation.close() + +logger.info("\n✅ Example completed successfully!") +logger.info("\nThis example demonstrated how to:") +logger.info("1. Create a custom tool that logs structured data to JSON") +logger.info("2. Build a simple base image with the custom tool") +logger.info("3. Use DockerDevWorkspace with base_image to build agent server on top") +logger.info("4. Enable dynamic tool registration on the server") +logger.info("5. Use the custom tool during agent execution") +logger.info("6. Read the logged data back from the workspace") +``` + +```bash Running the Example +# Build the custom base image first +cd examples/02_remote_agent_server/06_custom_tool +./build_custom_image.sh + +# Run the example +export LLM_API_KEY="your-api-key" +uv run python custom_tool_example.py +``` + + +## Next Steps + +- **[Custom Tools (Standalone)](/sdk/guides/custom-tools)** - For local execution without remote server +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Understanding remote agent servers + +### Docker Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/docker-sandbox.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +The docker sandboxed agent server demonstrates how to run agents in isolated Docker containers using `DockerWorkspace`. + +This provides complete isolation from the host system, making it ideal for production deployments, testing, and executing untrusted code safely. + +Use `DockerWorkspace` with a pre-built agent server image for the fastest startup. When you need to build your own image from a base image, switch to `DockerDevWorkspace`. + +the Docker sandbox image ships with features configured in the [Dockerfile](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-agent-server/openhands/agent_server/docker/Dockerfile) (e.g., secure defaults and services like VSCode and VNC exposed behind well-defined ports), which are not available in the local (non-Docker) agent server. + +## 1) Basic Docker Sandbox + +> A ready-to-run example is available [here](#ready-to-run-example-docker-sandbox)! + +### Key Concepts + +#### DockerWorkspace Context Manager + +The `DockerWorkspace` uses a context manager to automatically handle container lifecycle: + +```python icon="python" +with DockerWorkspace( + # use pre-built image for faster startup (recommended) + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, + platform=detect_platform(), +) as workspace: + # Container is running here + # Work with the workspace + pass +# Container is automatically stopped and cleaned up here +``` + +The workspace automatically: +- Pulls or builds the Docker image +- Starts the container with an agent server +- Waits for the server to be ready +- Cleans up the container when done + +#### Platform Detection + +The example includes platform detection to ensure the correct Docker image is built and used: + +```python icon="python" +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" +``` + +This ensures compatibility across different CPU architectures (Intel/AMD vs ARM/Apple Silicon). + + +#### Testing the Workspace + +Before creating a conversation, the example tests the workspace connection: + +```python icon="python" +result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" +) +logger.info( + f"Command '{result.command}' completed" + f"with exit code {result.exit_code}" +) +logger.info(f"Output: {result.stdout}") +``` + +This verifies the workspace is properly initialized and can execute commands. + +#### Automatic RemoteConversation + +When you use a DockerWorkspace, the Conversation automatically becomes a RemoteConversation: + +```python icon="python" focus={1, 3, 7} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, +) +assert isinstance(conversation, RemoteConversation) +``` + +The SDK detects the remote workspace and uses WebSocket communication for real-time event streaming. + + +#### DockerWorkspace vs DockerDevWorkspace + +Use `DockerWorkspace` when you can rely on the official pre-built images for the agent server. Switch to `DockerDevWorkspace` when you need to build or customize the image on-demand (slower startup, requires the SDK source tree and Docker build support). + +```python icon="python" +# ✅ Fast: Use pre-built image (recommended) +DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, +) + +# 🛠️ Custom: Build on the fly (requires SDK tooling) +DockerDevWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, + target="source", +) +``` + +### Ready-tu-run Example Docker Sandbox + +This example is available on GitHub: [examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py) + + +This example shows how to create a DockerWorkspace that automatically manages Docker containers for agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py +import os +import platform +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" + + +# 2) Create a Docker-based remote workspace that will set up and manage +# the Docker container automatically. Use `DockerWorkspace` with a pre-built +# image or `DockerDevWorkspace` to automatically build the image on-demand. +# with DockerDevWorkspace( +# # dynamically build agent-server image +# base_image="nikolaik/python-nodejs:python3.13-nodejs22", +# host_port=8010, +# platform=detect_platform(), +# ) as workspace: +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with DockerWorkspace( + # use pre-built image for faster startup + server_image=server_image, + host_port=8010, + platform=detect_platform(), +) as workspace: + # 3) Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # 4) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # 5) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + + logger.info("📝 Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + logger.info("🚀 Running conversation...") + conversation.run() + logger.info("✅ First task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") + + # Wait for events to settle (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") + + logger.info("🚀 Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("✅ Second task completed!") + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` + + + + +--- + +## 2) VS Code in Docker Sandbox + +> A ready-to-run example is available [here](#ready-to-run-example-vs-code)! + +VS Code with Docker demonstrates how to enable VS Code Web integration in a Docker-sandboxed environment. This allows you to access a full VS Code editor running in the container, making it easy to inspect, edit, and manage files that the agent is working with. + +### Key Concepts + +#### VS Code-Enabled DockerWorkspace + +The workspace is configured with extra ports for VS Code access: + +```python icon="python" focus={1, 5} +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=18010, + platform="linux/arm64", # or "linux/amd64" depending on your architecture + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to access VSCode at localhost:18011""" +``` + +The `extra_ports=True` setting exposes: +- Port `host_port+1`: VS Code Web interface (host_port + 1) +- Port `host_port+2`: VNC viewer for visual access + +If you need to customize the agent-server image, swap in `DockerDevWorkspace` with the same parameters and provide `base_image`/`target` to build on demand. + +#### VS Code URL Generation + +The example retrieves the VS Code URL with authentication token: + +```python icon="python" +# Get VSCode URL with token +vscode_port = (workspace.host_port or 8010) + 1 +try: + response = httpx.get( + f"{workspace.host}/api/vscode/url", + params={"workspace_dir": workspace.working_dir}, + ) + vscode_data = response.json() + vscode_url = vscode_data.get("url", "").replace( + "localhost:8001", f"localhost:{vscode_port}" + ) +except Exception: + # Fallback if server route not available + folder = ( + f"/{workspace.working_dir}" + if not str(workspace.working_dir).startswith("/") + else str(workspace.working_dir) + ) + vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" +``` + +This generates a properly authenticated URL with the workspace directory pre-opened. + +#### VS Code URL Format + +```text +http://localhost:{vscode_port}/?tkn={token}&folder={workspace_dir} +``` +where: +- `vscode_port`: Usually host_port + 1 (e.g., 8011) +- `token`: Authentication token for security +- `workspace_dir`: Workspace directory to open + +### Ready-to-run Example VS Code + + +This example is available on GitHub: [examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py) + + + +```python icon="python" expandable examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py +import os +import platform +import time + +import httpx +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, get_logger +from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +# Create a Docker-based remote workspace with extra ports for VSCode access +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" + + +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with DockerWorkspace( + server_image=server_image, + host_port=18010, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to access VSCode at localhost:18011""" + + # Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # Create RemoteConversation using the workspace + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + logger.info("📝 Sending first message...") + conversation.send_message("Create a simple Python script that prints Hello World") + conversation.run() + + # Get VSCode URL with token + vscode_port = (workspace.host_port or 8010) + 1 + try: + response = httpx.get( + f"{workspace.host}/api/vscode/url", + params={"workspace_dir": workspace.working_dir}, + ) + vscode_data = response.json() + vscode_url = vscode_data.get("url", "").replace( + "localhost:8001", f"localhost:{vscode_port}" + ) + except Exception: + # Fallback if server route not available + folder = ( + f"/{workspace.working_dir}" + if not str(workspace.working_dir).startswith("/") + else str(workspace.working_dir) + ) + vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" + + # Wait for user to explore VSCode + y = None + while y != "y": + y = input( + "\n" + "Because you've enabled extra_ports=True in DockerDevWorkspace, " + "you can open VSCode Web to see the workspace.\n\n" + f"VSCode URL: {vscode_url}\n\n" + "The VSCode should have the OpenHands settings extension installed:\n" + " - Dark theme enabled\n" + " - Auto-save enabled\n" + " - Telemetry disabled\n" + " - Auto-updates disabled\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + + + +--- + +## 3) Browser in Docker Sandbox +> A ready-to-run example is available [here](#ready-to-run-example-browser)! + +Browser with Docker demonstrates how to enable browser automation capabilities in a Docker-sandboxed environment. This allows agents to browse websites, interact with web content, and perform web automation tasks while maintaining complete isolation from your host system. + +### Key Concepts + +#### Browser-Enabled DockerWorkspace + +The workspace is configured with extra ports for browser access: + +```python icon="python" focus={1-5} +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to check localhost:8012 for VNC""" +``` + +The `extra_ports=True` setting exposes additional ports for: +- Port `host_port+1`: VS Code Web interface +- Port `host_port+2`: VNC viewer for browser visualization + +If you need to pre-build a custom browser image, replace `DockerWorkspace` with `DockerDevWorkspace` and provide `base_image`/`target` to build before launch. + + +#### Enabling Browser Tools + +Browser tools are enabled by setting `cli_mode=False`: + +```python icon="python" focus={2, 4} +# Create agent with browser tools enabled +agent = get_default_agent( + llm=llm, + cli_mode=False, # CLI mode = False will enable browser tools +) +``` + +When `cli_mode=False`, the agent gains access to browser automation tools for web interaction. + +When VNC is available and `extra_ports=True`, the browser will be opened in the VNC desktop to visualize agent's work. You can watch the browser in real-time via VNC. Demo video: + + +#### VNC Access + +The VNC interface provides real-time visual access to the browser: + +```text +http://localhost:8012/vnc.html?autoconnect=1&resize=remote +``` + +- `autoconnect=1`: Automatically connect to VNC server +- `resize=remote`: Automatically adjust resolution + +--- + +### Ready-to-run Example Browser + + +This example is available on GitHub: [examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py) + + +This example shows how to configure `DockerWorkspace` with browser capabilities and VNC access: + +```python icon="python" expandable examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py +import os +import platform +import time + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, get_logger +from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" + + +# Create a Docker-based remote workspace with extra ports for browser access. +# Use `DockerWorkspace` with a pre-built image or `DockerDevWorkspace` to +# automatically build the image on-demand. +# with DockerDevWorkspace( +# # dynamically build agent-server image +# base_image="nikolaik/python-nodejs:python3.13-nodejs22", +# host_port=8010, +# platform=detect_platform(), +# ) as workspace: +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with DockerWorkspace( + server_image=server_image, + host_port=8011, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to check localhost:8012 for VNC""" + + # Create agent with browser tools enabled + agent = get_default_agent( + llm=llm, + cli_mode=False, # CLI mode = False will enable browser tools + ) + + # Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # Create RemoteConversation using the workspace + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + logger.info("📝 Sending first message...") + conversation.send_message( + "Could you go to https://openhands.dev/ blog page and summarize main " + "points of the latest blog?" + ) + conversation.run() + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + + if os.getenv("CI"): + logger.info( + "CI environment detected; skipping interactive prompt and closing workspace." # noqa: E501 + ) + else: + # Wait for user confirm to exit when running locally + y = None + while y != "y": + y = input( + "Because you've enabled extra_ports=True in DockerDevWorkspace, " + "you can open a browser tab to see the *actual* browser OpenHands " + "is interacting with via VNC.\n\n" + "Link: http://localhost:8012/vnc.html?autoconnect=1&resize=remote\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + + + +## Next Steps + +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted API service +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture + +### Local Agent Server +Source: https://docs.openhands.dev/sdk/guides/agent-server/local-server.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The Local Agent Server demonstrates how to run a remote agent server locally and connect to it using `RemoteConversation`. This pattern is useful for local development, testing, and scenarios where you want to separate the client code from the agent execution environment. + +## Key Concepts + +### Managed API Server + +The ready-to-run example includes a `ManagedAPIServer` context manager that handles starting and stopping the server subprocess: + +```python icon="python" focus={1, 2, 4, 5} +class ManagedAPIServer: + """Context manager for subprocess-managed OpenHands API server.""" + + def __enter__(self): + """Start the API server subprocess.""" + self.process = subprocess.Popen( + [ + "python", + "-m", + "openhands.agent_server", + "--port", + str(self.port), + "--host", + self.host, + ], + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + env={"LOG_JSON": "true", **os.environ}, + ) +``` + +The server starts with `python -m openhands.agent_server` and automatically handles health checks to ensure it's ready before proceeding. + +### Remote Workspace + +When connecting to a remote server, you need to provide a `Workspace` that connects to that server: + +```python icon="python" +workspace = Workspace(host=server.base_url) +result = workspace.execute_command("pwd") +``` + +When `host` is provided, the `Workspace` returns an instance of `RemoteWorkspace` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/workspace.py)). +The `Workspace` object communicates with the remote server's API to execute commands and manage files. + +### RemoteConversation + +When you pass a remote `Workspace` to `Conversation`, it automatically becomes a `RemoteConversation` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py)): + +```python icon="python" focus={1, 3, 7} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, +) +assert isinstance(conversation, RemoteConversation) +``` + +`RemoteConversation` handles communication with the remote agent server over WebSocket for real-time event streaming. + +### Event Callbacks + +Callbacks receive events in real-time as they happen on the remote server: + +```python icon="python" +def event_callback(event): + """Callback to capture events for testing.""" + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + event_tracker["last_event_time"] = time.time() +``` + +This enables monitoring agent activity, tracking progress, and implementing custom event handling logic. + +### Conversation State + +The conversation state provides access to all events and status: + +```python icon="python" +# Count total events using state.events +total_events = len(conversation.state.events) +logger.info(f"📈 Total events in conversation: {total_events}") + +# Get recent events (last 5) using state.events +all_events = conversation.state.events +recent_events = all_events[-5:] if len(all_events) >= 5 else all_events +``` + +This allows you to inspect the conversation history, analyze agent behavior, and build custom monitoring tools. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/02_remote_agent_server/01_convo_with_local_agent_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/01_convo_with_local_agent_server.py) + + +This example shows how to programmatically start a local agent server and interact with it through a `RemoteConversation`: + +```python icon="python" expandable examples/02_remote_agent_server/01_convo_with_local_agent_server.py +import os +import subprocess +import sys +import tempfile +import threading +import time + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, RemoteConversation, Workspace, get_logger +from openhands.sdk.event import ConversationStateUpdateEvent +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + + +def _stream_output(stream, prefix, target_stream): + """Stream output from subprocess to target stream with prefix.""" + try: + for line in iter(stream.readline, ""): + if line: + target_stream.write(f"[{prefix}] {line}") + target_stream.flush() + except Exception as e: + print(f"Error streaming {prefix}: {e}", file=sys.stderr) + finally: + stream.close() + + +class ManagedAPIServer: + """Context manager for subprocess-managed OpenHands API server.""" + + def __init__(self, port: int = 8000, host: str = "127.0.0.1"): + self.port: int = port + self.host: str = host + self.process: subprocess.Popen[str] | None = None + self.base_url: str = f"http://{host}:{port}" + self.stdout_thread: threading.Thread | None = None + self.stderr_thread: threading.Thread | None = None + + def __enter__(self): + """Start the API server subprocess.""" + print(f"Starting OpenHands API server on {self.base_url}...") + + # Start the server process + self.process = subprocess.Popen( + [ + "python", + "-m", + "openhands.agent_server", + "--port", + str(self.port), + "--host", + self.host, + ], + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + env={"LOG_JSON": "true", **os.environ}, + ) + + # Start threads to stream stdout and stderr + assert self.process is not None + assert self.process.stdout is not None + assert self.process.stderr is not None + self.stdout_thread = threading.Thread( + target=_stream_output, + args=(self.process.stdout, "SERVER", sys.stdout), + daemon=True, + ) + self.stderr_thread = threading.Thread( + target=_stream_output, + args=(self.process.stderr, "SERVER", sys.stderr), + daemon=True, + ) + + self.stdout_thread.start() + self.stderr_thread.start() + + # Wait for server to be ready + max_retries = 30 + for i in range(max_retries): + try: + import httpx + + response = httpx.get(f"{self.base_url}/health", timeout=1.0) + if response.status_code == 200: + print(f"API server is ready at {self.base_url}") + return self + except Exception: + pass + + assert self.process is not None + if self.process.poll() is not None: + # Process has terminated + raise RuntimeError( + "Server process terminated unexpectedly. " + "Check the server logs above for details." + ) + + time.sleep(1) + + raise RuntimeError(f"Server failed to start after {max_retries} seconds") + + def __exit__(self, exc_type, exc_val, exc_tb): + """Stop the API server subprocess.""" + if self.process: + print("Stopping API server...") + self.process.terminate() + try: + self.process.wait(timeout=5) + except subprocess.TimeoutExpired: + print("Force killing API server...") + self.process.kill() + self.process.wait() + + # Wait for streaming threads to finish (they're daemon threads, + # so they'll stop automatically) + # But give them a moment to flush any remaining output + time.sleep(0.5) + print("API server stopped.") + + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) +title_gen_llm = LLM( + usage_id="title-gen-llm", + model=os.getenv("LLM_MODEL", "openhands/gpt-5-mini-2025-08-07"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + +# Use managed API server +with ManagedAPIServer(port=8001) as server: + # Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, # Disable browser tools for simplicity + ) + + # Define callbacks to test the WebSocket functionality + received_events = [] + event_tracker = {"last_event_time": time.time()} + + def event_callback(event): + """Callback to capture events for testing.""" + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + event_tracker["last_event_time"] = time.time() + + # Create RemoteConversation with callbacks + # NOTE: Workspace is required for RemoteConversation + # Use a temp directory that exists and is accessible in CI environments + temp_workspace_dir = tempfile.mkdtemp(prefix="agent_server_demo_") + workspace = Workspace(host=server.base_url, working_dir=temp_workspace_dir) + result = workspace.execute_command("pwd") + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + + # Send first message and run + logger.info("📝 Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + + # Generate title using a specific LLM + title = conversation.generate_title(max_length=60, llm=title_gen_llm) + logger.info(f"Generated conversation title: {title}") + + logger.info("🚀 Running conversation...") + conversation.run() + + logger.info("✅ First task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") + + # Wait for events to stop coming (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - event_tracker["last_event_time"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") + + logger.info("🚀 Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("✅ Second task completed!") + + # Demonstrate state.events functionality + logger.info("\n" + "=" * 50) + logger.info("📊 Demonstrating State Events API") + logger.info("=" * 50) + + # Count total events using state.events + total_events = len(conversation.state.events) + logger.info(f"📈 Total events in conversation: {total_events}") + + # Get recent events (last 5) using state.events + logger.info("\n🔍 Getting last 5 events using state.events...") + all_events = conversation.state.events + recent_events = all_events[-5:] if len(all_events) >= 5 else all_events + + for i, event in enumerate(recent_events, 1): + event_type = type(event).__name__ + timestamp = getattr(event, "timestamp", "Unknown") + logger.info(f" {i}. {event_type} at {timestamp}") + + # Let's see what the actual event types are + logger.info("\n🔍 Event types found:") + event_types = set() + for event in recent_events: + event_type = type(event).__name__ + event_types.add(event_type) + for event_type in sorted(event_types): + logger.info(f" - {event_type}") + + # Print all ConversationStateUpdateEvent + logger.info("\n🗂️ ConversationStateUpdateEvent events:") + for event in conversation.state.events: + if isinstance(event, ConversationStateUpdateEvent): + logger.info(f" - {event}") + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + + finally: + # Clean up + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` + + + +## Next Steps + +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run server in Docker for isolation +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted API service +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture + +### Overview +Source: https://docs.openhands.dev/sdk/guides/agent-server/overview.md + +Remote Agent Servers package the Software Agent SDK into containers you can deploy anywhere (Kubernetes, VMs, on‑prem, any cloud) with strong isolation. The remote path uses the exact same SDK API as local—switching is just changing the workspace argument; your Conversation code stays the same. + + +For example, switching from a local workspace to a Docker‑based remote agent server: + +```python icon="python" lines +# Local → Docker +conversation = Conversation(agent=agent, workspace=os.getcwd()) # [!code --] +from openhands.workspace import DockerWorkspace # [!code ++] +with DockerWorkspace( # [!code ++] + server_image="ghcr.io/openhands/agent-server:latest-python", # [!code ++] +) as workspace: # [!code ++] + conversation = Conversation(agent=agent, workspace=workspace) # [!code ++] +``` + +Use `DockerWorkspace` with the pre-built agent server image for the fastest startup. When you need to build from a custom base image, switch to [`DockerDevWorkspace`](/sdk/guides/agent-server/docker-sandbox). + +Or switching to an API‑based remote workspace (via [OpenHands Runtime API](https://runtime.all-hands.dev/)): + +```python icon="python" lines +# Local → Remote API +conversation = Conversation(agent=agent, workspace=os.getcwd()) # [!code --] +from openhands.workspace import APIRemoteWorkspace # [!code ++] +with APIRemoteWorkspace( # [!code ++] + runtime_api_url="https://runtime.eval.all-hands.dev", # [!code ++] + runtime_api_key="YOUR_API_KEY", # [!code ++] + server_image="ghcr.io/openhands/agent-server:latest-python", # [!code ++] +) as workspace: # [!code ++] + conversation = Conversation(agent=agent, workspace=workspace) # [!code ++] +``` + + +## What is a Remote Agent Server? + +A Remote Agent Server is an HTTP/WebSocket server that: +- **Package the Software Agent SDK into containers** and deploy on your own infrastructure (Kubernetes, VMs, on-prem, or cloud) +- **Runs agents** on dedicated infrastructure +- **Manages workspaces** (Docker containers or remote sandboxes) +- **Streams events** to clients via WebSocket +- **Handles command and file operations** (execute command, upload, download), check [base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py) for more details +- **Provides isolation** between different agent executions + +Think of it as the "backend" for your agent, while your Python code acts as the "frontend" client. + +{/* +Same interfaces as local: +[BaseConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/base.py), +[ConversationStateProtocol](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/base.py), +[EventsListBase](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/events_list_base.py). Server-backed impl: +[RemoteConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py). + */} + + +## Architecture Overview + +Remote Agent Servers follow a simple three-part architecture: + +```mermaid +graph TD + Client[Client Code] -->|HTTP / WebSocket| Server[Agent Server] + Server --> Workspace[Workspace] + + subgraph Workspace Types + Workspace --> Local[Local Folder] + Workspace --> Docker[Docker Container] + Workspace --> API[Remote Sandbox via API] + end + + Local --> Files[File System] + Docker --> Container[Isolated Runtime] + API --> Cloud[Cloud Infrastructure] + + style Client fill:#e1f5fe + style Server fill:#fff3e0 + style Workspace fill:#e8f5e8 +``` + +1. **Client (Python SDK)** — Your application creates and controls conversations using the SDK. +2. **Agent Server** — A lightweight HTTP/WebSocket service that runs the agent and manages workspace execution. +3. **Workspace** — An isolated environment (local, Docker, or remote VM) where the agent code runs. + +The same SDK API works across all three workspace types—you just switch which workspace the conversation connects to. + +## How Remote Conversations Work + +Each step in the diagram maps directly to how the SDK and server interact: + +### 1. Workspace Connection → *(Client → Server)* + +When you create a conversation with a remote workspace (e.g., `DockerWorkspace` or `APIRemoteWorkspace`), the SDK automatically starts or connects to an agent server inside that workspace: + +```python icon="python" +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest" +) as workspace: + conversation = Conversation(agent=agent, workspace=workspace) +``` + +This turns the local `Conversation` into a **[RemoteConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py)** that speaks to the agent server over HTTP/WebSocket. + + +### 2. Server Initialization → *(Server → Workspace)* + +Once the workspace starts: +- It launches the agent server process. +- Waits for it to be ready. +- Shares the server URL with the SDK client. + +You don’t need to manage this manually—the workspace context handles startup and teardown automatically. + +### 3. Event Streaming → *(Bidirectional WebSocket)* + +The client and agent server maintain a live WebSocket connection for streaming events: + +```python icon="python" +def on_event(event): + print(f"Received: {type(event).__name__}") + +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[on_event], +) +``` + +This allows you to see real-time updates from the running agent as it executes tasks inside the workspace. + +### 4. Workspace Supports File and Command Operations → *(Server ↔ Workspace)* + +Workspace supports file and command operations via the agent server API ([base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py)), ensuring isolation and consistent behavior: + +```python icon="python" +workspace.file_upload(local_path, remote_path) +workspace.file_download(remote_path, local_path) +result = workspace.execute_command("ls -la") +print(result.stdout) +``` + +These commands are proxied through the agent server, whether it’s a Docker container or a remote VM, keeping your client code environment-agnostic. + +### Summary + +The architecture makes remote execution seamless: +- Your **client code** stays the same. +- The **agent server** manages execution and streaming. +- The **workspace** provides secure, isolated runtime environments. + +Switching from local to remote is just a matter of swapping the workspace class—no code rewrites needed. + +## Next Steps + +Explore different deployment options: + +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** - Run agent server in the same process +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run agent server in isolated Docker containers +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted agent server via API + +For architectural details: +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture and deployment + +### Stuck Detector +Source: https://docs.openhands.dev/sdk/guides/agent-stuck-detector.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The Stuck Detector automatically identifies when an agent enters unproductive patterns such as repeating the same actions, encountering repeated errors, or engaging in monologues. By analyzing the conversation history after the last user message, it detects five types of stuck patterns: + +1. **Repeating Action-Observation Cycles**: The same action produces the same observation repeatedly (4+ times) +2. **Repeating Action-Error Cycles**: The same action repeatedly results in errors (3+ times) +3. **Agent Monologue**: The agent sends multiple consecutive messages without user input or meaningful progress (3+ messages) +4. **Alternating Patterns**: Two different action-observation pairs alternate in a ping-pong pattern (6+ cycles) +5. **Context Window Errors**: Repeated context window errors that indicate memory management issues + +When enabled (which is the default), the stuck detector monitors the conversation in real-time and can automatically halt execution when stuck patterns are detected, preventing infinite loops and wasted resources. + + + For more information about the detection algorithms and how pattern matching works, refer to the [StuckDetector source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py). + + + +## How It Works + +In the [ready-to-run example](#ready-to-run-example), the agent is deliberately given a task designed to trigger stuck detection - executing the same `ls` +command 5 times in a row. The stuck detector analyzes the event history and identifies the repetitive pattern: + +1. The conversation proceeds normally until the agent starts repeating actions +2. After detecting the pattern (4 identical action-observation pairs), the stuck detector flags the conversation as stuck +3. The conversation can then handle this gracefully, either by stopping execution or taking corrective action + +The example demonstrates that stuck detection is enabled by default (`stuck_detection=True`), and you can check the +stuck status at any point using `conversation.stuck_detector.is_stuck()`. + +## Pattern Detection + +The stuck detector compares events based on their semantic content rather than object identity. For example: +- **Actions** are compared by their tool name, action content, and thought (ignoring IDs and metrics) +- **Observations** are compared by their observation content and tool name +- **Errors** are compared by their error messages +- **Messages** are compared by their content and source + +This allows the detector to identify truly repetitive behavior while ignoring superficial differences like timestamps or event IDs. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/20_stuck_detector.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/20_stuck_detector.py) + + + +```python icon="python" expandable examples/01_standalone_sdk/20_stuck_detector.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +agent = get_default_agent(llm=llm) + +llm_messages = [] + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Create conversation with built-in stuck detection +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=os.getcwd(), + # This is by default True, shown here for clarity of the example + stuck_detection=True, +) + +# Send a task that will be caught by stuck detection +conversation.send_message( + "Please execute 'ls' command 5 times, each in its own " + "action without any thought and then exit at the 6th step." +) + +# Run the conversation - stuck detection happens automatically +conversation.run() + +assert conversation.stuck_detector is not None +final_stuck_check = conversation.stuck_detector.is_stuck() +print(f"Final stuck status: {final_stuck_check}") + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## Next Steps + +- **[Conversation Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Manual execution control +- **[Hello World](/sdk/guides/hello-world)** - Learn the basics of the SDK + +### Theory of Mind (TOM) Agent +Source: https://docs.openhands.dev/sdk/guides/agent-tom-agent.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +## Overview + +Tom (Theory of Mind) Agent provides advanced user understanding capabilities that help your agent interpret vague instructions and adapt to user preferences over time. Built on research in user mental modeling, Tom agents can: + +- Understand unclear or ambiguous user requests +- Provide personalized guidance based on user modeling +- Build long-term user preference profiles +- Adapt responses based on conversation history + +This is particularly useful when: +- User instructions are vague or incomplete +- You need to infer user intent from minimal context +- Building personalized experiences across multiple conversations +- Understanding user preferences and working patterns + +## Research Foundation + +Tom agent is based on the TOM-SWE research paper on user mental modeling for software engineering agents: + +```bibtex Citation +@misc{zhou2025tomsweusermentalmodeling, + title={TOM-SWE: User Mental Modeling For Software Engineering Agents}, + author={Xuhui Zhou and Valerie Chen and Zora Zhiruo Wang and Graham Neubig and Maarten Sap and Xingyao Wang}, + year={2025}, + eprint={2510.21903}, + archivePrefix={arXiv}, + primaryClass={cs.SE}, + url={https://arxiv.org/abs/2510.21903}, +} +``` + + +Paper: [TOM-SWE on arXiv](https://arxiv.org/abs/2510.21903) + + +## Quick Start + + +This example is available on GitHub: [examples/01_standalone_sdk/30_tom_agent.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/30_tom_agent.py) + + +```python icon="python" expandable examples/01_standalone_sdk/30_tom_agent.py +"""Example demonstrating Tom agent with Theory of Mind capabilities. + +This example shows how to set up an agent with Tom tools for getting +personalized guidance based on user modeling. Tom tools include: +- TomConsultTool: Get guidance for vague or unclear tasks +- SleeptimeComputeTool: Index conversations for user modeling +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation +from openhands.sdk.tool import Tool +from openhands.tools.preset.default import get_default_tools +from openhands.tools.tom_consult import ( + SleeptimeComputeAction, + SleeptimeComputeObservation, + SleeptimeComputeTool, + TomConsultTool, +) + + +# Configure LLM +api_key: str | None = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm: LLM = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL", None), + usage_id="agent", + drop_params=True, +) + +# Build tools list with Tom tools +# Note: Tom tools are automatically registered on import (PR #862) +tools = get_default_tools(enable_browser=False) + +# Configure Tom tools with parameters +tom_params: dict[str, bool | str] = { + "enable_rag": True, # Enable RAG in Tom agent +} + +# Add LLM configuration for Tom tools (uses same LLM as main agent) +tom_params["llm_model"] = llm.model +if llm.api_key: + if isinstance(llm.api_key, SecretStr): + tom_params["api_key"] = llm.api_key.get_secret_value() + else: + tom_params["api_key"] = llm.api_key +if llm.base_url: + tom_params["api_base"] = llm.base_url + +# Add both Tom tools to the agent +tools.append(Tool(name=TomConsultTool.name, params=tom_params)) +tools.append(Tool(name=SleeptimeComputeTool.name, params=tom_params)) + +# Create agent with Tom capabilities +# This agent can consult Tom for personalized guidance +# Note: Tom's user modeling data will be stored in ~/.openhands/ +agent: Agent = Agent(llm=llm, tools=tools) + +# Start conversation +cwd: str = os.getcwd() +PERSISTENCE_DIR = os.path.expanduser("~/.openhands") +CONVERSATIONS_DIR = os.path.join(PERSISTENCE_DIR, "conversations") +conversation = Conversation( + agent=agent, workspace=cwd, persistence_dir=CONVERSATIONS_DIR +) + +# Optionally run sleeptime compute to index existing conversations +# This builds user preferences and patterns from conversation history +# Using execute_tool allows running tools before conversation.run() +print("\nRunning sleeptime compute to index conversations...") +try: + sleeptime_result = conversation.execute_tool( + "sleeptime_compute", SleeptimeComputeAction() + ) + # Cast to the expected observation type for type-safe access + if isinstance(sleeptime_result, SleeptimeComputeObservation): + print(f"Result: {sleeptime_result.message}") + print(f"Sessions processed: {sleeptime_result.sessions_processed}") + else: + print(f"Result: {sleeptime_result.text}") +except KeyError as e: + print(f"Tool not available: {e}") + +# Send a potentially vague message where Tom consultation might help +conversation.send_message( + "I need to debug some code but I'm not sure where to start. " + + "Can you help me figure out the best approach?" +) +conversation.run() + +print("\n" + "=" * 80) +print("Tom agent consultation example completed!") +print("=" * 80) + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") + + +# Optional: Index this conversation for Tom's user modeling +# This builds user preferences and patterns from conversation history +# Uncomment the lines below to index the conversation: +# +# conversation.send_message("Please index this conversation using sleeptime_compute") +# conversation.run() +# print("\nConversation indexed for user modeling!") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Tom Tools + +### TomConsultTool + +The consultation tool provides personalized guidance when the agent encounters vague or unclear user requests: + +```python icon="python" +# The agent can automatically call this tool when needed +# Example: User says "I need to debug something" +# Tom analyzes the vague request and provides specific guidance +``` + +Key features: +- Analyzes conversation history for context +- Provides personalized suggestions based on user modeling +- Helps disambiguate vague instructions +- Adapts to user communication patterns + +### SleeptimeComputeTool + +The indexing tool processes conversation history to build user preference profiles: + +```python icon="python" +# Index conversations for future personalization +sleeptime_compute_tool = conversation.agent.tools_map.get("sleeptime_compute") +if sleeptime_compute_tool: + result = sleeptime_compute_tool.executor( + SleeptimeComputeAction(), conversation + ) +``` + +Key features: +- Processes conversation history into user models +- Stores preferences in `~/.openhands/` directory +- Builds understanding of user patterns over time +- Enables long-term personalization across sessions + +## Configuration + +### RAG Support + +Enable retrieval-augmented generation for enhanced context awareness: + +```python icon="python" +tom_params = { + "enable_rag": True, # Enable RAG for better context retrieval +} +``` + +### Custom LLM for Tom + +You can optionally use a different LLM for Tom's internal reasoning: + +```python icon="python" +# Use the same LLM as main agent +tom_params["llm_model"] = llm.model +tom_params["api_key"] = llm.api_key.get_secret_value() + +# Or configure a separate LLM for Tom +tom_llm = LLM(model="gpt-4", api_key=SecretStr("different-key")) +tom_params["llm_model"] = tom_llm.model +tom_params["api_key"] = tom_llm.api_key.get_secret_value() +``` + +## Data Storage + +Tom stores user modeling data persistently in `~/.openhands/`: + + + + + + + + + + + + + + + + + +where +- `user_models/` stores user preference profiles, with each user having their own subdirectory containing `user_model.json` (the current user model). +- `conversations/` contains indexed conversation data + +This persistent storage enables Tom to: +- Remember user preferences across sessions +- Track which conversations have been indexed +- Build long-term understanding of user patterns + +## Use Cases + +### 1. Handling Vague Requests + +When a user provides minimal information: + +```python icon="python" +conversation.send_message("Help me with that bug") +# Tom analyzes history to determine which bug and suggest approach +``` + +### 2. Personalized Recommendations + +Tom adapts suggestions based on past interactions: + +```python icon="python" +# After multiple conversations, Tom learns: +# - User prefers minimal explanations +# - User typically works with Python +# - User values efficiency over verbosity +``` + +### 3. Intent Inference + +Understanding what the user really wants: + +```python icon="python" +conversation.send_message("Make it better") +# Tom infers from context what "it" is and how to improve it +``` + +## Best Practices + +1. **Enable RAG**: For better context awareness, always enable RAG: + ```python icon="python" + tom_params = {"enable_rag": True} + ``` + +2. **Index Regularly**: Run sleeptime compute after important conversations to build better user models + +3. **Provide Context**: Even with Tom, providing more context leads to better results + +4. **Monitor Data**: Check `~/.openhands/` periodically to understand what's being learned + +5. **Privacy Considerations**: Be aware that conversation data is stored locally for user modeling + +## Next Steps + +- **[Agent Delegation](/sdk/guides/agent-delegation)** - Combine Tom with sub-agents for complex workflows +- **[Context Condenser](/sdk/guides/context-condenser)** - Manage long conversation histories effectively +- **[Custom Tools](/sdk/guides/custom-tools)** - Create tools that work with Tom's insights + +### Browser Session Recording +Source: https://docs.openhands.dev/sdk/guides/browser-session-recording.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The browser session recording feature allows you to capture your agent's browser interactions and replay them later using [rrweb](https://github.com/rrweb-io/rrweb). This is useful for debugging, auditing, and understanding how your agent interacts with web pages. + +## How It Works + +The recording feature uses rrweb to capture DOM mutations, mouse movements, scrolling, and other browser events. The recordings are saved as JSON files that can be replayed using rrweb-player or the online viewer. + +The [ready-to-run example](#ready-to-run-example) demonstrates: + +1. **Starting a recording**: Use `browser_start_recording` to begin capturing browser events +2. **Browsing and interacting**: Navigate to websites and perform actions while recording +3. **Stopping the recording**: Use `browser_stop_recording` to stop and save the recording + +The recording files are automatically saved to the persistence directory when the recording is stopped. + +## Replaying Recordings + +After recording a session, you can replay it using: + +- **rrweb-player**: A standalone player component - [GitHub](https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player) +- **Online viewer**: Upload your recording at [rrweb.io/demo](https://www.rrweb.io/) + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/38_browser_session_recording.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/38_browser_session_recording.py) + + +```python icon="python" expandable examples/01_standalone_sdk/38_browser_session_recording.py +"""Browser Session Recording Example + +This example demonstrates how to use the browser session recording feature +to capture and save a recording of the agent's browser interactions using rrweb. + +The recording can be replayed later using rrweb-player to visualize the agent's +browsing session. + +The recording will be automatically saved to the persistence directory when +browser_stop_recording is called. You can replay it with: + - rrweb-player: https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player + - Online viewer: https://www.rrweb.io/demo/ +""" + +import json +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.browser_use import BrowserToolSet +from openhands.tools.browser_use.definition import BROWSER_RECORDING_OUTPUT_DIR + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools - including browser tools with recording capability +cwd = os.getcwd() +tools = [ + Tool(name=BrowserToolSet.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Create conversation with persistence_dir set to save browser recordings +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir="./.conversations", +) + +# The prompt instructs the agent to: +# 1. Start recording the browser session +# 2. Browse to a website and perform some actions +# 3. Stop recording (auto-saves to file) +PROMPT = """ +Please complete the following task to demonstrate browser session recording: + +1. First, use `browser_start_recording` to begin recording the browser session. + +2. Then navigate to https://docs.openhands.dev/ and: + - Get the page content + - Scroll down the page + - Get the browser state to see interactive elements + +3. Next, navigate to https://docs.openhands.dev/openhands/usage/cli/installation and: + - Get the page content + - Scroll down to see more content + +4. Finally, use `browser_stop_recording` to stop the recording. + Events are automatically saved. +""" + +print("=" * 80) +print("Browser Session Recording Example") +print("=" * 80) +print("\nTask: Record an agent's browser session and save it for replay") +print("\nStarting conversation with agent...\n") + +conversation.send_message(PROMPT) +conversation.run() + +print("\n" + "=" * 80) +print("Conversation finished!") +print("=" * 80) + +# Check if the recording files were created +# Recordings are saved in BROWSER_RECORDING_OUTPUT_DIR/recording-{timestamp}/ +if os.path.exists(BROWSER_RECORDING_OUTPUT_DIR): + # Find recording subdirectories (they start with "recording-") + recording_dirs = sorted( + [ + d + for d in os.listdir(BROWSER_RECORDING_OUTPUT_DIR) + if d.startswith("recording-") + and os.path.isdir(os.path.join(BROWSER_RECORDING_OUTPUT_DIR, d)) + ] + ) + + if recording_dirs: + # Process the most recent recording directory + latest_recording = recording_dirs[-1] + recording_path = os.path.join(BROWSER_RECORDING_OUTPUT_DIR, latest_recording) + json_files = sorted( + [f for f in os.listdir(recording_path) if f.endswith(".json")] + ) + + print(f"\n✓ Recording saved to: {recording_path}") + print(f"✓ Number of files: {len(json_files)}") + + # Count total events across all files + total_events = 0 + all_event_types: dict[int | str, int] = {} + total_size = 0 + + for json_file in json_files: + filepath = os.path.join(recording_path, json_file) + file_size = os.path.getsize(filepath) + total_size += file_size + + with open(filepath) as f: + events = json.load(f) + + # Events are stored as a list in each file + if isinstance(events, list): + total_events += len(events) + for event in events: + event_type = event.get("type", "unknown") + all_event_types[event_type] = all_event_types.get(event_type, 0) + 1 + + print(f" - {json_file}: {len(events)} events, {file_size} bytes") + + print(f"✓ Total events: {total_events}") + print(f"✓ Total size: {total_size} bytes") + if all_event_types: + print(f"✓ Event types: {all_event_types}") + + print("\nTo replay this recording, you can use:") + print( + " - rrweb-player: " + "https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player" + ) + else: + print(f"\n✗ No recording directories found in: {BROWSER_RECORDING_OUTPUT_DIR}") + print(" The agent may not have completed the recording task.") +else: + print(f"\n✗ Observations directory not found: {BROWSER_RECORDING_OUTPUT_DIR}") + print(" The agent may not have completed the recording task.") + +print("\n" + "=" * 100) +print("Conversation finished.") +print(f"Total LLM messages: {len(llm_messages)}") +print("=" * 100) + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"Conversation ID: {conversation.id}") +print(f"EXAMPLE_COST: {cost}") +``` + + + +### Context Condenser +Source: https://docs.openhands.dev/sdk/guides/context-condenser.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## What is a Context Condenser? + +A **context condenser** is a crucial component that addresses one of the most persistent challenges in AI agent development: managing growing conversation context efficiently. As conversations with AI agents grow longer, the cumulative history leads to: + +- **💰 Increased API Costs**: More tokens in the context means higher costs per API call +- **⏱️ Slower Response Times**: Larger contexts take longer to process +- **📉 Reduced Effectiveness**: LLMs become less effective when dealing with excessive irrelevant information + +The context condenser solves this by intelligently summarizing older parts of the conversation while preserving essential information needed for the agent to continue working effectively. + +## Default Implementation: `LLMSummarizingCondenser` + +OpenHands SDK provides `LLMSummarizingCondenser` as the default condenser implementation. This condenser uses an LLM to generate summaries of conversation history when it exceeds the configured size limit. + +### How It Works + +When conversation history exceeds a defined threshold, the LLM-based condenser: + +1. **Keeps recent messages intact** - The most recent exchanges remain unchanged for immediate context +2. **Preserves key information** - Important details like user goals, technical specifications, and critical files are retained +3. **Summarizes older content** - Earlier parts of the conversation are condensed into concise summaries using LLM-generated summaries +4. **Maintains continuity** - The agent retains awareness of past progress without processing every historical interaction + +{/* Auto-switching light/dark mode image. */} +Light mode interface +Dark mode interface + +This approach achieves remarkable efficiency gains: +- Up to **2x reduction** in per-turn API costs +- **Consistent response times** even in long sessions +- **Equivalent or better performance** on software engineering tasks + +Learn more about the implementation and benchmarks in our [blog post on context condensation](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). + +### Extensibility + +The `LLMSummarizingCondenser` extends the `RollingCondenser` base class, which provides a framework for condensers that work with rolling conversation history. You can create custom condensers by extending base classes ([source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)): + +- **`RollingCondenser`** - For condensers that apply condensation to rolling history +- **`CondenserBase`** - For more specialized condensation strategies + +This architecture allows you to implement custom condensation logic tailored to your specific needs while leveraging the SDK's conversation management infrastructure. + + +### Setting Up Condensing + +Create a `LLMSummarizingCondenser` to manage the context. +The condenser will automatically truncate conversation history when it exceeds max_size, and replaces the dropped events with an LLM-generated summary. + +This condenser triggers when there are more than `max_context_length` events in +the conversation history, and always keeps the first `keep_first` events (system prompts, +initial user messages) to preserve important context. + +```python focus={3-4} icon="python" +from openhands.sdk.context import LLMSummarizingCondenser + +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2 +) + +# Agent with condenser +agent = Agent(llm=llm, tools=tools, condenser=condenser) +``` + +### Ready-to-run example + + +This example is available on GitHub: [examples/01_standalone_sdk/14_context_condenser.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py) + + + +Automatically condense conversation history when context length exceeds limits, reducing token usage while preserving important information: + +```python icon="python" expandable examples/01_standalone_sdk/14_context_condenser.py +""" +To manage context in long-running conversations, the agent can use a context condenser +that keeps the conversation history within a specified size limit. This example +demonstrates using the `LLMSummarizingCondenser`, which automatically summarizes +older parts of the conversation when the history exceeds a defined threshold. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.context.condenser import LLMSummarizingCondenser +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), +] + +# Create a condenser to manage the context. The condenser will automatically truncate +# conversation history when it exceeds max_size, and replaces the dropped events with an +# LLM-generated summary. This condenser triggers when there are more than ten events in +# the conversation history, and always keeps the first two events (system prompts, +# initial user messages) to preserve important context. +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2 +) + +# Agent with condenser +agent = Agent(llm=llm, tools=tools, condenser=condenser) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + persistence_dir="./.conversations", + workspace=".", +) + +# Send multiple messages to demonstrate condensation +print("Sending multiple messages to demonstrate LLM Summarizing Condenser...") + +conversation.send_message( + "Hello! Can you create a Python file named math_utils.py with functions for " + "basic arithmetic operations (add, subtract, multiply, divide)?" +) +conversation.run() + +conversation.send_message( + "Great! Now add a function to calculate the factorial of a number." +) +conversation.run() + +conversation.send_message("Add a function to check if a number is prime.") +conversation.run() + +conversation.send_message( + "Add a function to calculate the greatest common divisor (GCD) of two numbers." +) +conversation.run() + +conversation.send_message( + "Now create a test file to verify all these functions work correctly." +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Conversation persistence +print("Serializing conversation...") + +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + persistence_dir="./.conversations", + workspace=".", +) + +print("Sending message to deserialized conversation...") +conversation.send_message("Finally, clean up by deleting both files.") +conversation.run() + +print("=" * 100) +print("Conversation finished with LLM Summarizing Condenser.") +print(f"Total LLM messages collected: {len(llm_messages)}") +print("\nThe condenser automatically summarized older conversation history") +print("when the conversation exceeded the configured max_size threshold.") +print("This helps manage context length while preserving important information.") + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage reduction and analyze cost savings + +### Ask Agent Questions +Source: https://docs.openhands.dev/sdk/guides/convo-ask-agent.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +Use `ask_agent()` to get quick responses from the agent about the current conversation state without +interrupting the main execution flow. + +## Key Features + +The `ask_agent()` method provides several important capabilities: + +#### Context-Aware Responses + +The agent has access to the full conversation history when answering questions: + +```python focus={2-3} icon="python" wrap +# Agent can reference what it has done so far +response = conversation.ask_agent( + "Summarize the activity so far in 1 sentence." +) +print(f"Response: {response}") +``` + +#### Non-Intrusive Operation + +Questions don't interrupt the main conversation flow - they're processed separately: + +```python focus={4-6} icon="python" wrap +# Start main conversation +thread = threading.Thread(target=conversation.run) +thread.start() + +# Ask questions without affecting main execution +response = conversation.ask_agent("How's the progress?") +``` + +#### Works During and After Execution + +You can ask questions while the agent is running or after it has completed: + +```python focus={3,7} icon="python" wrap +# During execution +time.sleep(2) # Let agent start working +response1 = conversation.ask_agent("Have you finished running?") + +# After completion +thread.join() +response2 = conversation.ask_agent("What did you accomplish?") +``` + +### Use Cases + +- **Progress Monitoring**: Check on long-running tasks +- **Status Updates**: Get real-time information about agent activities +- **User Interfaces**: Provide sidebar information in chat applications + +## Ready-to-run Example + + + This example is available on GitHub: + [examples/01_standalone_sdk/28_ask_agent_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/28_ask_agent_example.py) + + +Example demonstrating the ask_agent functionality for getting sidebar replies +from the agent for a running conversation. + +This example shows how to use `ask_agent()` to get quick responses from the agent +about the current conversation state without interrupting the main execution flow. + +```python icon="python" expandable examples/01_standalone_sdk/28_ask_agent_example.py +""" +Example demonstrating the ask_agent functionality for getting sidebar replies +from the agent for a running conversation. + +This example shows how to use ask_agent() to get quick responses from the agent +about the current conversation state without interrupting the main execution flow. +""" + +import os +import threading +import time +from datetime import datetime + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.conversation import ConversationVisualizerBase +from openhands.sdk.event import Event +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), +] + + +class MinimalVisualizer(ConversationVisualizerBase): + """A minimal visualizer that print the raw events as they occur.""" + + count = 0 + + def on_event(self, event: Event) -> None: + """Handle events for minimal progress visualization.""" + print(f"\n\n[EVENT {self.count}] {type(event).__name__}") + self.count += 1 + + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation( + agent=agent, workspace=cwd, visualizer=MinimalVisualizer, max_iteration_per_run=5 +) + + +def timestamp() -> str: + return datetime.now().strftime("%H:%M:%S") + + +print("=== Ask Agent Example ===") +print("This example demonstrates asking questions during conversation execution") + +# Step 1: Build conversation context +print(f"\n[{timestamp()}] Building conversation context...") +conversation.send_message("Explore the current directory and describe the architecture") + +# Step 2: Start conversation in background thread +print(f"[{timestamp()}] Starting conversation in background thread...") +thread = threading.Thread(target=conversation.run) +thread.start() + +# Give the agent time to start processing +time.sleep(2) + +# Step 3: Use ask_agent while conversation is running +print(f"\n[{timestamp()}] Using ask_agent while conversation is processing...") + +# Ask context-aware questions +questions_and_responses = [] + +question_1 = "Summarize the activity so far in 1 sentence." +print(f"\n[{timestamp()}] Asking: {question_1}") +response1 = conversation.ask_agent(question_1) +questions_and_responses.append((question_1, response1)) +print(f"Response: {response1}") + +time.sleep(1) + +question_2 = "How's the progress?" +print(f"\n[{timestamp()}] Asking: {question_2}") +response2 = conversation.ask_agent(question_2) +questions_and_responses.append((question_2, response2)) +print(f"Response: {response2}") + +time.sleep(1) + +question_3 = "Have you finished running?" +print(f"\n[{timestamp()}] {question_3}") +response3 = conversation.ask_agent(question_3) +questions_and_responses.append((question_3, response3)) +print(f"Response: {response3}") + +# Step 4: Wait for conversation to complete +print(f"\n[{timestamp()}] Waiting for conversation to complete...") +thread.join() + +# Step 5: Verify conversation state wasn't affected +final_event_count = len(conversation.state.events) +# Step 6: Ask a final question after conversation completion +print(f"\n[{timestamp()}] Asking final question after completion...") +final_response = conversation.ask_agent( + "Can you summarize what you accomplished in this conversation?" +) +print(f"Final response: {final_response}") + +# Step 7: Summary +print("\n" + "=" * 60) +print("SUMMARY OF ASK_AGENT DEMONSTRATION") +print("=" * 60) + +print("\nQuestions and Responses:") +for i, (question, response) in enumerate(questions_and_responses, 1): + print(f"\n{i}. Q: {question}") + print(f" A: {response[:100]}{'...' if len(response) > 100 else ''}") + +final_truncated = final_response[:100] + ("..." if len(final_response) > 100 else "") +print(f"\nFinal Question Response: {final_truncated}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost:.4f}") +``` + + + + +## Next Steps + +- **[Send Messages While Running](/sdk/guides/convo-send-message-while-running)** - Interrupt and redirect agent execution +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Custom Visualizers](/sdk/guides/convo-custom-visualizer)** - Monitor conversation progress + +### Conversation with Async +Source: https://docs.openhands.dev/sdk/guides/convo-async.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +### Concurrent Agents + +Run multiple agent tasks in parallel using `asyncio.gather()`: + +```python icon="python" wrap +async def main(): + loop = asyncio.get_running_loop() + callback = AsyncCallbackWrapper(callback_coro, loop) + + # Create multiple conversation tasks running in parallel + tasks = [ + loop.run_in_executor(None, run_conversation, callback), + loop.run_in_executor(None, run_conversation, callback), + loop.run_in_executor(None, run_conversation, callback) + ] + results = await asyncio.gather(*tasks) +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/11_async.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/11_async.py) + + +This example demonstrates usage of a Conversation in an async context +(e.g.: From a fastapi server). The conversation is run in a background +thread and a callback with results is executed in the main runloop + +```python icon="python" expandable examples/01_standalone_sdk/11_async.py +""" +This example demonstrates usage of a Conversation in an async context +(e.g.: From a fastapi server). The conversation is run in a background +thread and a callback with results is executed in the main runloop +""" + +import asyncio +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.conversation.types import ConversationCallbackType +from openhands.sdk.tool import Tool +from openhands.sdk.utils.async_utils import AsyncCallbackWrapper +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +# Callback coroutine +async def callback_coro(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Synchronous run conversation +def run_conversation(callback: ConversationCallbackType): + conversation = Conversation(agent=agent, callbacks=[callback]) + + conversation.send_message( + "Hello! Can you create a new Python file named hello.py that prints " + "'Hello, World!'? Use task tracker to plan your steps." + ) + conversation.run() + + conversation.send_message("Great! Now delete that file.") + conversation.run() + + +async def main(): + loop = asyncio.get_running_loop() + + # Create the callback + callback = AsyncCallbackWrapper(callback_coro, loop) + + # Run the conversation in a background thread and wait for it to finish... + await loop.run_in_executor(None, run_conversation, callback) + + print("=" * 100) + print("Conversation finished. Got the following LLM messages:") + for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + + # Report cost + cost = llm.metrics.accumulated_cost + print(f"EXAMPLE_COST: {cost}") + + +if __name__ == "__main__": + asyncio.run(main()) +``` + + + +## Next Steps + +- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state +- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents + +### Custom Visualizer +Source: https://docs.openhands.dev/sdk/guides/convo-custom-visualizer.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The SDK provides flexible visualization options. You can use the default rich-formatted visualizer, customize it with highlighting patterns, or build completely custom visualizers by subclassing `ConversationVisualizerBase`. + +## Visualizer Configuration Options + +The `visualizer` parameter in `Conversation` controls how events are displayed: + +```python icon="python" focus={4-5, 7-8, 10-11, 13, 18, 20, 25} +from openhands.sdk import Conversation +from openhands.sdk.conversation import DefaultConversationVisualizer, ConversationVisualizerBase + +# Option 1: Use default visualizer (enabled by default) +conversation = Conversation(agent=agent, workspace=workspace) + +# Option 2: Disable visualization +conversation = Conversation(agent=agent, workspace=workspace, visualizer=None) + +# Option 3: Pass a visualizer class (will be instantiated automatically) +conversation = Conversation(agent=agent, workspace=workspace, visualizer=DefaultConversationVisualizer) + +# Option 4: Pass a configured visualizer instance +custom_viz = DefaultConversationVisualizer( + name="MyAgent", + highlight_regex={r"^Reasoning:": "bold cyan"} +) +conversation = Conversation(agent=agent, workspace=workspace, visualizer=custom_viz) + +# Option 5: Use custom visualizer class +class MyVisualizer(ConversationVisualizerBase): + def on_event(self, event): + print(f"Event: {event}") + +conversation = Conversation(agent=agent, workspace=workspace, visualizer=MyVisualizer()) +``` + +## Customizing the Default Visualizer + +`DefaultConversationVisualizer` uses Rich panels and supports customization through configuration: + +```python icon="python" focus={3-14, 19} +from openhands.sdk.conversation import DefaultConversationVisualizer + +# Configure highlighting patterns using regex +custom_visualizer = DefaultConversationVisualizer( + name="MyAgent", # Prefix panel titles with agent name + highlight_regex={ + r"^Reasoning:": "bold cyan", # Lines starting with "Reasoning:" + r"^Thought:": "bold green", # Lines starting with "Thought:" + r"^Action:": "bold yellow", # Lines starting with "Action:" + r"\[ERROR\]": "bold red", # Error markers anywhere + r"\*\*(.*?)\*\*": "bold", # Markdown bold **text** + }, + skip_user_messages=False, # Show user messages +) + +conversation = Conversation( + agent=agent, + workspace=workspace, + visualizer=custom_visualizer +) +``` + +**When to use**: Perfect for customizing colors and highlighting without changing the panel-based layout. + +## Creating Custom Visualizers + +For complete control over visualization, subclass `ConversationVisualizerBase`: + +```python icon="python" focus={4, 11, 28} +from openhands.sdk.conversation import ConversationVisualizerBase +from openhands.sdk.event import ActionEvent, ObservationEvent, AgentErrorEvent, Event + +class MinimalVisualizer(ConversationVisualizerBase): + """A minimal visualizer that prints raw event information.""" + + def __init__(self, name: str | None = None): + super().__init__(name=name) + self.step_count = 0 + + def on_event(self, event: Event) -> None: + """Handle each event.""" + if isinstance(event, ActionEvent): + self.step_count += 1 + tool_name = event.tool_name or "unknown" + print(f"Step {self.step_count}: {tool_name}") + + elif isinstance(event, ObservationEvent): + print(f" → Result received") + + elif isinstance(event, AgentErrorEvent): + print(f"❌ Error: {event.error}") + +# Use your custom visualizer +conversation = Conversation( + agent=agent, + workspace=workspace, + visualizer=MinimalVisualizer(name="Agent") +) +``` + +### Key Methods + +**`__init__(self, name: str | None = None)`** +- Initialize your visualizer with optional configuration +- `name` parameter is available from the base class for agent identification +- Call `super().__init__(name=name)` to initialize the base class + +**`initialize(self, state: ConversationStateProtocol)`** +- Called automatically by `Conversation` after state is created +- Provides access to conversation state and statistics via `self._state` +- Override if you need custom initialization, but call `super().initialize(state)` + +**`on_event(self, event: Event)`** *(required)* +- Called for each conversation event +- Implement your visualization logic here +- Access conversation stats via `self.conversation_stats` property + +**When to use**: When you need a completely different output format, custom state tracking, or integration with external systems. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/26_custom_visualizer.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/26_custom_visualizer.py) + + +```python icon="python" expandable examples/01_standalone_sdk/26_custom_visualizer.py +"""Custom Visualizer Example + +This example demonstrates how to create and use a custom visualizer by subclassing +ConversationVisualizer. This approach provides: +- Clean, testable code with class-based state management +- Direct configuration (just pass the visualizer instance to visualizer parameter) +- Reusable visualizer that can be shared across conversations + +This demonstrates how you can pass a ConversationVisualizer instance directly +to the visualizer parameter for clean, reusable visualization logic. +""" + +import logging +import os + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation +from openhands.sdk.conversation.visualizer import ConversationVisualizerBase +from openhands.sdk.event import ( + Event, +) +from openhands.tools.preset.default import get_default_agent + + +class MinimalVisualizer(ConversationVisualizerBase): + """A minimal visualizer that print the raw events as they occur.""" + + def on_event(self, event: Event) -> None: + """Handle events for minimal progress visualization.""" + print(f"\n\n[EVENT] {type(event).__name__}: {event.model_dump_json()[:200]}...") + + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="agent", +) +agent = get_default_agent(llm=llm, cli_mode=True) + +# ============================================================================ +# Configure Visualization +# ============================================================================ +# Set logging level to reduce verbosity +logging.getLogger().setLevel(logging.WARNING) + +# Start a conversation with custom visualizer +cwd = os.getcwd() +conversation = Conversation( + agent=agent, + workspace=cwd, + visualizer=MinimalVisualizer(), +) + +# Send a message and let the agent run +print("Sending task to agent...") +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("Task completed!") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost:.4f}") +``` + + + +## Next Steps + +Now that you understand custom visualizers, explore these related topics: + +- **[Events](/sdk/arch/events)** - Learn more about different event types +- **[Conversation Metrics](/sdk/guides/metrics)** - Track LLM usage, costs, and performance data +- **[Send Messages While Running](/sdk/guides/convo-send-message-while-running)** - Interactive conversations with real-time updates +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control agent execution flow with custom logic + +### Pause and Resume +Source: https://docs.openhands.dev/sdk/guides/convo-pause-and-resume.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +### Pausing Execution + +Pause the agent from another thread or after a delay using `conversation.pause()`, and +Resume the paused conversation after performing operations by calling `conversation.run()` again. + +```python icon="python" focus={9, 15} wrap +import time +thread = threading.Thread(target=conversation.run) +thread.start() + +print("Letting agent work for 5 seconds...") +time.sleep(5) + +print("Pausing the agent...") +conversation.pause() + +print("Waiting for 5 seconds...") +time.sleep(5) + +print("Resuming the execution...") +conversation.run() +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/09_pause_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/09_pause_example.py) + + +Pause agent execution mid-task by calling `conversation.pause()`: + +```python icon="python" expandable examples/01_standalone_sdk/09_pause_example.py +import os +import threading +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent, workspace=os.getcwd()) + +print("=" * 60) +print("Pause and Continue Example") +print("=" * 60) +print() + +# Phase 1: Start a long-running task +print("Phase 1: Starting agent with a task...") +conversation.send_message( + "Create a file called countdown.txt and write numbers from 100 down to 1, " + "one number per line. After you finish, summarize what you did." +) + +print(f"Initial status: {conversation.state.execution_status}") +print() + +# Start the agent in a background thread +thread = threading.Thread(target=conversation.run) +thread.start() + +# Let the agent work for a few seconds +print("Letting agent work for 2 seconds...") +time.sleep(2) + +# Phase 2: Pause the agent +print() +print("Phase 2: Pausing the agent...") +conversation.pause() + +# Wait for the thread to finish (it will stop when paused) +thread.join() + +print(f"Agent status after pause: {conversation.state.execution_status}") +print() + +# Phase 3: Send a new message while paused +print("Phase 3: Sending a new message while agent is paused...") +conversation.send_message( + "Actually, stop working on countdown.txt. Instead, create a file called " + "hello.txt with just the text 'Hello, World!' in it." +) +print() + +# Phase 4: Resume the agent with .run() +print("Phase 4: Resuming agent with .run()...") +print(f"Status before resume: {conversation.state.execution_status}") + +# Resume execution +conversation.run() + +print(f"Final status: {conversation.state.execution_status}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + + +## Next Steps + +- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state +- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents + +### Persistence +Source: https://docs.openhands.dev/sdk/guides/convo-persistence.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## How to use Persistence + +Save conversation state to disk and restore it later for long-running or multi-session workflows. + +### Saving State + +Create a conversation with a unique ID to enable persistence: + +```python focus={3-4,10-11} icon="python" wrap +import uuid + +conversation_id = uuid.uuid4() +persistence_dir = "./.conversations" + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) +conversation.send_message("Start long task") +conversation.run() # State automatically saved +``` + +### Restoring State + +Restore a conversation using the same ID and persistence directory: + +```python focus={9-10} icon="python" +# Later, in a different session +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) + +conversation.send_message("Continue task") +conversation.run() # Continues from saved state +``` + +## What Gets Persisted + +The conversation state includes information that allows seamless restoration: + +- **Message History**: Complete event log including user messages, agent responses, and system events +- **Agent Configuration**: LLM settings, tools, MCP servers, and agent parameters +- **Execution State**: Current agent status (idle, running, paused, etc.), iteration count, and stuck detection settings +- **Tool Outputs**: Results from bash commands, file operations, and other tool executions +- **Statistics**: LLM usage metrics like token counts and API calls +- **Workspace Context**: Working directory and file system state +- **Activated Skills**: [Skills](/sdk/guides/skill) that have been enabled during the conversation +- **Secrets**: Managed credentials and API keys +- **Agent State**: Custom runtime state stored by agents (see [Agent State](#agent-state) below) + + + For the complete implementation details, see the [ConversationState class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py) in the source code. + + +## Persistence Directory Structure + +When you set a `persistence_dir`, your conversation will be persisted to a directory structure where each +conversation has its own subdirectory. By default, the persistence directory is `workspace/conversations/` +(unless you specify a custom path). + +**Directory structure:** + + + + + + + + + + + + + + + + + + + + + +Each conversation directory contains: +- **`base_state.json`**: The core conversation state including agent configuration, execution status, statistics, and metadata +- **`events/`**: A subdirectory containing individual event files, each named with a sequential index and event ID (e.g., `event-00000-abc123.json`) + +The collection of event files in the `events/` directory represents the same trajectory data you would find in the `trajectory.json` file from OpenHands V0, but split into individual files for better performance and granular access. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/10_persistence.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/10_persistence.py) + + +```python icon="python" expandable examples/01_standalone_sdk/10_persistence.py +import os +import uuid + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Add MCP Tools +mcp_config = { + "mcpServers": { + "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, + } +} +# Agent +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation_id = uuid.uuid4() +persistence_dir = "./.conversations" + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands. Then write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Conversation persistence +print("Serializing conversation...") + +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) + +print("Sending message to deserialized conversation...") +conversation.send_message("Hey what did you create? Return an agent finish action") +conversation.run() + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## Reading serialized events + +Convert persisted events into LLM-ready messages for reuse or analysis. + + +This example is available on GitHub: [examples/01_standalone_sdk/36_event_json_to_openai_messages.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/36_event_json_to_openai_messages.py) + + +```python icon="python" expandable examples/01_standalone_sdk/36_event_json_to_openai_messages.py +"""Load persisted events and convert them into LLM-ready messages.""" + +import json +import os +import uuid +from pathlib import Path + +from pydantic import SecretStr + + +conversation_id = uuid.uuid4() +persistence_root = Path(".conversations") +log_dir = ( + persistence_root / "logs" / "event-json-to-openai-messages" / conversation_id.hex +) + +os.environ.setdefault("LOG_JSON", "true") +os.environ.setdefault("LOG_TO_FILE", "true") +os.environ.setdefault("LOG_DIR", str(log_dir)) +os.environ.setdefault("LOG_LEVEL", "INFO") + +from openhands.sdk import ( # noqa: E402 + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + Tool, +) +from openhands.sdk.logger import get_logger, setup_logging # noqa: E402 +from openhands.tools.terminal import TerminalTool # noqa: E402 + + +setup_logging(log_to_file=True, log_dir=str(log_dir)) +logger = get_logger(__name__) + +api_key = os.getenv("LLM_API_KEY") +if not api_key: + raise RuntimeError("LLM_API_KEY environment variable is not set.") + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + +agent = Agent( + llm=llm, + tools=[Tool(name=TerminalTool.name)], +) + +###### +# Create a conversation that persists its events +###### + +conversation = Conversation( + agent=agent, + workspace=os.getcwd(), + persistence_dir=str(persistence_root), + conversation_id=conversation_id, +) + +conversation.send_message( + "Use the terminal tool to run `pwd` and write the output to tool_output.txt. " + "Reply with a short confirmation once done." +) +conversation.run() + +conversation.send_message( + "Without using any tools, summarize in one sentence what you did." +) +conversation.run() + +assert conversation.state.persistence_dir is not None +persistence_dir = Path(conversation.state.persistence_dir) +event_dir = persistence_dir / "events" + +event_paths = sorted(event_dir.glob("event-*.json")) + +if not event_paths: + raise RuntimeError("No event files found. Was persistence enabled?") + +###### +# Read from serialized events +###### + + +events = [Event.model_validate_json(path.read_text()) for path in event_paths] + +convertible_events = [ + event for event in events if isinstance(event, LLMConvertibleEvent) +] +llm_messages = LLMConvertibleEvent.events_to_messages(convertible_events) + +if llm.uses_responses_api(): + logger.info("Formatting messages for the OpenAI Responses API.") + instructions, input_items = llm.format_messages_for_responses(llm_messages) + logger.info("Responses instructions:\n%s", instructions) + logger.info("Responses input:\n%s", json.dumps(input_items, indent=2)) +else: + logger.info("Formatting messages for the OpenAI Chat Completions API.") + chat_messages = llm.format_messages_for_llm(llm_messages) + logger.info("Chat Completions messages:\n%s", json.dumps(chat_messages, indent=2)) + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## How State Persistence Works + +The SDK uses an **automatic persistence** system that saves state changes immediately when they occur. This ensures that conversation state is always recoverable, even if the process crashes unexpectedly. + +### Auto-Save Mechanism + +When you modify any public field on `ConversationState`, the SDK automatically: + +1. Detects the field change via a custom `__setattr__` implementation +2. Serializes the entire base state to `base_state.json` +3. Triggers any registered state change callbacks + +This happens transparently—you don't need to call any save methods manually. + +```python +# These changes are automatically persisted: +conversation.state.execution_status = ConversationExecutionStatus.RUNNING +conversation.state.max_iterations = 100 +``` + +### Events vs Base State + +The persistence system separates data into two categories: + +| Category | Storage | Contents | +|----------|---------|----------| +| **Base State** | `base_state.json` | Agent configuration, execution status, statistics, secrets, agent_state | +| **Events** | `events/event-*.json` | Message history, tool calls, observations, all conversation events | + +Events are appended incrementally (one file per event), while base state is overwritten on each change. This design optimizes for: +- **Fast event appends**: No need to rewrite the entire history +- **Atomic state updates**: Base state is always consistent +- **Efficient restoration**: Events can be loaded lazily + + + +## Next Steps + +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations + +### Send Message While Running +Source: https://docs.openhands.dev/sdk/guides/convo-send-message-while-running.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + + +This example is available on GitHub: [examples/01_standalone_sdk/18_send_message_while_processing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/18_send_message_while_processing.py) + + +Send additional messages to a running agent mid-execution to provide corrections, updates, or additional context: + +```python icon="python" expandable examples/01_standalone_sdk/18_send_message_while_processing.py +""" +Example demonstrating that user messages can be sent and processed while +an agent is busy. + +This example demonstrates a key capability of the OpenHands agent system: the ability +to receive and process new user messages even while the agent is actively working on +a previous task. This is made possible by the agent's event-driven architecture. + +Demonstration Flow: +1. Send initial message asking agent to: + - Write "Message 1 sent at [time], written at [CURRENT_TIME]" + - Wait 3 seconds + - Write "Message 2 sent at [time], written at [CURRENT_TIME]" + [time] is the time the message was sent to the agent + [CURRENT_TIME] is the time the agent writes the line +2. Start agent processing in a background thread +3. While agent is busy (during the 3-second delay), send a second message asking to add: + - "Message 3 sent at [time], written at [CURRENT_TIME]" +4. Verify that all three lines are processed and included in the final document + +Expected Evidence: +The final document will contain three lines with dual timestamps: +- "Message 1 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written immediately) +- "Message 2 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written after 3-second delay) +- "Message 3 sent at HH:MM:SS, written at HH:MM:SS" (from second message sent during delay) + +The timestamps will show that Message 3 was sent while the agent was running, +but was still successfully processed and written to the document. + +This proves that: +- The second user message was sent while the agent was processing the first task +- The agent successfully received and processed the second message +- The agent's event system allows for real-time message integration during processing + +Key Components Demonstrated: +- Conversation.send_message(): Adds messages to events list immediately +- Agent.step(): Processes all events including newly added messages +- Threading: Allows message sending while agent is actively processing +""" # noqa + +import os +import threading +import time +from datetime import datetime + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent) + + +def timestamp() -> str: + return datetime.now().strftime("%H:%M:%S") + + +print("=== Send Message While Processing Example ===") + +# Step 1: Send initial message +start_time = timestamp() +conversation.send_message( + f"Create a file called document.txt and write this first sentence: " + f"'Message 1 sent at {start_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write the line. " + f"Then wait 3 seconds and write 'Message 2 sent at {start_time}, written at [CURRENT_TIME].'" # noqa +) + +# Step 2: Start agent processing in background +thread = threading.Thread(target=conversation.run) +thread.start() + +# Step 3: Wait then send second message while agent is processing +time.sleep(2) # Give agent time to start working + +second_time = timestamp() + +conversation.send_message( + f"Please also add this second sentence to document.txt: " + f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write this line." +) + +# Wait for completion +thread.join() + +# Verification +document_path = os.path.join(cwd, "document.txt") +if os.path.exists(document_path): + with open(document_path) as f: + content = f.read() + + print("\nDocument contents:") + print("─────────────────────") + print(content) + print("─────────────────────") + + # Check if both messages were processed + if "Message 1" in content and "Message 2" in content: + print("\nSUCCESS: Agent processed both messages!") + print( + "This proves the agent received the second message while processing the first task." # noqa + ) + else: + print("\nWARNING: Agent may not have processed the second message") + + # Clean up + os.remove(document_path) +else: + print("WARNING: Document.txt was not created") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +### Sending Messages During Execution + +As shown in the example above, use threading to send messages while the agent is running: + +```python icon="python" +# Start agent processing in background +thread = threading.Thread(target=conversation.run) +thread.start() + +# Wait then send second message while agent is processing +time.sleep(2) # Give agent time to start working + +second_time = timestamp() + +conversation.send_message( + f"Please also add this second sentence to document.txt: " + f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write this line." +) + +# Wait for completion +thread.join() +``` + +The key steps are: +1. Start `conversation.run()` in a background thread +2. Send additional messages using `conversation.send_message()` while the agent is processing +3. Use `thread.join()` to wait for completion + +The agent receives and incorporates the new message mid-execution, allowing for real-time corrections and dynamic guidance. + +## Next Steps + +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations + +### Critic (Experimental) +Source: https://docs.openhands.dev/sdk/guides/critic.md + + +**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. + + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +## What is a Critic? + +A **critic** is an evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. The critic runs alongside the agent and provides: + +- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success +- **Real-time feedback**: Scores computed during agent execution, not just at completion +- **Iterative refinement**: Automatic retry with follow-up prompts when scores are below threshold + +You can use critic scores to build automated workflows, such as triggering the agent to reflect on and fix its previous solution when the critic indicates poor task performance. + + +This critic is a more advanced extension of the approach described in our blog post [SOTA on SWE-Bench Verified with Inference-Time Scaling and Critic Model](https://openhands.dev/blog/sota-on-swe-bench-verified-with-inference-time-scaling-and-critic-model). For detailed evaluation metrics and methodology, see our technical report: [A Rubric-Supervised Critic from Sparse Real-World Outcomes](https://arxiv.org/abs/2603.03800). + + +## Quick Start + +When using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`), the critic is **automatically configured** - no additional setup required. + +## Understanding Critic Results + +Critic evaluations produce scores and feedback: + +- **`score`**: Float between 0.0 and 1.0 representing predicted success probability +- **`message`**: Optional feedback with detailed probabilities +- **`success`**: Boolean property (True if score >= 0.5) + +Results are automatically displayed in the conversation visualizer: + +![Critic results in SDK visualizer](./assets/critic-sdk-visualizer.png) + +### Accessing Results Programmatically + +```python icon="python" focus={4-7} +from openhands.sdk import Event, ActionEvent, MessageEvent + +def callback(event: Event): + if isinstance(event, (ActionEvent, MessageEvent)): + if event.critic_result is not None: + print(f"Critic score: {event.critic_result.score:.3f}") + print(f"Success: {event.critic_result.success}") + +conversation = Conversation(agent=agent, callbacks=[callback]) +``` + +## Iterative Refinement with a Critic + +The critic supports **automatic iterative refinement** - when the agent finishes a task but the critic score is below a threshold, the conversation automatically continues with a follow-up prompt asking the agent to improve its work. + +### How It Works + +1. Agent completes a task and calls `FinishAction` +2. Critic evaluates the result and produces a score +3. If score < `success_threshold`, a follow-up prompt is sent automatically +4. Agent continues working to address issues +5. Process repeats until score meets threshold or `max_iterations` is reached + +### Configuration + +Use `IterativeRefinementConfig` to enable automatic retries: + +```python icon="python" focus={1,4-7,12} +from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig + +# Configure iterative refinement +iterative_config = IterativeRefinementConfig( + success_threshold=0.7, # Retry if score < 70% + max_iterations=3, # Maximum retry attempts +) + +# Attach to critic +critic = APIBasedCritic( + server_url="https://llm-proxy.eval.all-hands.dev/vllm", + api_key=api_key, + model_name="critic", + iterative_refinement=iterative_config, +) +``` + +### Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `success_threshold` | `float` | `0.6` | Score threshold (0-1) to consider task successful | +| `max_iterations` | `int` | `3` | Maximum number of iterations before giving up | + +### Custom Follow-up Prompts + +By default, the critic generates a generic follow-up prompt. You can customize this by subclassing `CriticBase` and overriding `get_followup_prompt()`: + +```python icon="python" focus={4-12} +from openhands.sdk.critic.base import CriticBase, CriticResult + +class CustomCritic(APIBasedCritic): + def get_followup_prompt(self, critic_result: CriticResult, iteration: int) -> str: + score_percent = critic_result.score * 100 + return f""" +Your solution scored {score_percent:.1f}% (iteration {iteration}). + +Please review your work carefully: +1. Check that all requirements are met +2. Verify tests pass +3. Fix any issues and try again +""" +``` + +### Example Workflow + +Here's what happens during iterative refinement: + +``` +Iteration 1: + → Agent creates files, runs tests + → Agent calls FinishAction + → Critic evaluates: score = 0.45 (below 0.7 threshold) + → Follow-up prompt sent automatically + +Iteration 2: + → Agent reviews and fixes issues + → Agent calls FinishAction + → Critic evaluates: score = 0.72 (above threshold) + → ✅ Success! Conversation ends +``` + +## Troubleshooting + +### Critic Evaluations Not Appearing + +- Verify the critic is properly configured and passed to the Agent +- Ensure you're using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`) + +### API Authentication Errors + +- Verify `LLM_API_KEY` is set correctly +- Check that the API key has not expired + +### Iterative Refinement Not Triggering + +- Ensure `iterative_refinement` config is attached to the critic +- Check that `success_threshold` is set appropriately (higher values trigger more retries) +- Verify the agent is using `FinishAction` to complete tasks + +## Ready-to-run Example + + +The critic model is hosted by the OpenHands LLM Provider and is currently free to use. This example is available on GitHub: [examples/01_standalone_sdk/34_critic_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/34_critic_example.py) + + +This example demonstrates iterative refinement with a moderately complex task - creating a Python word statistics tool with specific edge case requirements. The critic evaluates whether all requirements are met and triggers retries if needed. + +```python icon="python" expandable examples/01_standalone_sdk/34_critic_example.py +"""Iterative Refinement with Critic Model Example. + +This is EXPERIMENTAL. + +This example demonstrates how to use a critic model to shepherd an agent through +complex, multi-step tasks. The critic evaluates the agent's progress and provides +feedback that can trigger follow-up prompts when the agent hasn't completed the +task successfully. + +Key concepts demonstrated: +1. Setting up a critic with IterativeRefinementConfig for automatic retry +2. Conversation.run() automatically handles retries based on critic scores +3. Custom follow-up prompt generation via critic.get_followup_prompt() +4. Iterating until the task is completed successfully or max iterations reached + +For All-Hands LLM proxy (llm-proxy.*.all-hands.dev), the critic is auto-configured +using the same base_url with /vllm suffix and "critic" as the model name. +""" + +import os +import re +import tempfile +from pathlib import Path + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig +from openhands.sdk.critic.base import CriticBase +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +# Configuration +# Higher threshold (70%) makes it more likely the agent needs multiple iterations, +# which better demonstrates how iterative refinement works. +# Adjust as needed to see different behaviors. +SUCCESS_THRESHOLD = float(os.getenv("CRITIC_SUCCESS_THRESHOLD", "0.7")) +MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "3")) + + +def get_required_env(name: str) -> str: + value = os.getenv(name) + if value: + return value + raise ValueError( + f"Missing required environment variable: {name}. " + f"Set {name} before running this example." + ) + + +def get_default_critic(llm: LLM) -> CriticBase | None: + """Auto-configure critic for All-Hands LLM proxy. + + When the LLM base_url matches `llm-proxy.*.all-hands.dev`, returns an + APIBasedCritic configured with: + - server_url: {base_url}/vllm + - api_key: same as LLM + - model_name: "critic" + + Args: + llm: The LLM instance to derive critic configuration from. + + Returns: + An APIBasedCritic if the LLM is configured for All-Hands proxy, + None otherwise. + + Example: + llm = LLM( + model="anthropic/claude-sonnet-4-5", + api_key=api_key, + base_url="https://llm-proxy.eval.all-hands.dev", + ) + critic = get_default_critic(llm) + if critic is None: + # Fall back to explicit configuration + critic = APIBasedCritic( + server_url="https://my-critic-server.com", + api_key="my-api-key", + model_name="my-critic-model", + ) + """ + base_url = llm.base_url + api_key = llm.api_key + if base_url is None or api_key is None: + return None + + # Match: llm-proxy.{env}.all-hands.dev (e.g., staging, prod, eval) + pattern = r"^https?://llm-proxy\.[^./]+\.all-hands\.dev" + if not re.match(pattern, base_url): + return None + + return APIBasedCritic( + server_url=f"{base_url.rstrip('/')}/vllm", + api_key=api_key, + model_name="critic", + ) + + +# Task prompt designed to be moderately complex with subtle requirements. +# The task is simple enough to complete in 1-2 iterations, but has specific +# requirements that are easy to miss - triggering critic feedback. +INITIAL_TASK_PROMPT = """\ +Create a Python word statistics tool called `wordstats` that analyzes text files. + +## Structure + +Create directory `wordstats/` with: +- `stats.py` - Main module with `analyze_file(filepath)` function +- `cli.py` - Command-line interface +- `tests/test_stats.py` - Unit tests + +## Requirements for stats.py + +The `analyze_file(filepath)` function must return a dict with these EXACT keys: +- `lines`: total line count (including empty lines) +- `words`: word count +- `chars`: character count (including whitespace) +- `unique_words`: count of unique words (case-insensitive) + +### Important edge cases (often missed!): +1. Empty files must return all zeros, not raise an exception +2. Hyphenated words count as ONE word (e.g., "well-known" = 1 word) +3. Numbers like "123" or "3.14" are NOT counted as words +4. Contractions like "don't" count as ONE word +5. File not found must raise FileNotFoundError with a clear message + +## Requirements for cli.py + +When run as `python cli.py `: +- Print each stat on its own line: "Lines: X", "Words: X", etc. +- Exit with code 1 if file not found, printing error to stderr +- Exit with code 0 on success + +## Required Tests (test_stats.py) + +Write tests that verify: +1. Basic counting on normal text +2. Empty file returns all zeros +3. Hyphenated words counted correctly +4. Numbers are excluded from word count +5. FileNotFoundError raised for missing files + +## Verification Steps + +1. Create a sample file `sample.txt` with this EXACT content (no trailing newline): +`​`​` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +`​`​` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +# Use a weaker model to increase likelihood of needing multiple iterations +llm_model = os.getenv("LLM_MODEL", "anthropic/claude-haiku-4-5-20251001") +llm = LLM( + model=llm_model, + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL"), +) + +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) + +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) + +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") + +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) + +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() + +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` + +```bash Running the Example icon="terminal" +LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \ + uv run python examples/01_standalone_sdk/34_critic_example.py +``` + +### Example Output + +``` +📁 Created workspace: /tmp/critic_demo_abc123 + +====================================================================== +🚀 Starting Iterative Refinement with Critic Model +====================================================================== +Success threshold: 70% +Max iterations: 3 + +... agent works on the task ... + +✓ Critic evaluation: score=0.758, success=True + +Created files: + - sample.txt + - wordstats/cli.py + - wordstats/stats.py + - wordstats/tests/test_stats.py + +EXAMPLE_COST: 0.0234 +``` + +## Next Steps + +- **[Observability](/sdk/guides/observability)** - Monitor and log agent behavior +- **[Metrics](/sdk/guides/metrics)** - Collect performance metrics +- **[Stuck Detector](/sdk/guides/agent-stuck-detector)** - Detect unproductive agent patterns + +### Custom Tools +Source: https://docs.openhands.dev/sdk/guides/custom-tools.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> The ready-to-run example is available [here](#ready-to-run-example)! + +## Understanding the Tool System + +The SDK's tool system is built around three core components: + +1. **Action** - Defines input parameters (what the tool accepts) +2. **Observation** - Defines output data (what the tool returns) +3. **Executor** - Implements the tool's logic (what the tool does) + +These components are tied together by a **ToolDefinition** that registers the tool with the agent. + +## Built-in Tools + +The tools package ([source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)) provides a bunch of built-in tools that follow these patterns. + +```python icon="python" wrap +from openhands.tools import BashTool, FileEditorTool +from openhands.tools.preset import get_default_tools + +# Use specific tools +agent = Agent(llm=llm, tools=[BashTool.create(), FileEditorTool.create()]) + +# Or use preset +tools = get_default_tools() +agent = Agent(llm=llm, tools=tools) +``` + + +See [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) for the complete list of available tools and design philosophy. + + +## Creating a Custom Tool + +Here's a minimal example of creating a custom grep tool: + + + + ### Define the Action + Defines input parameters (what the tool accepts) + + ```python icon="python" wrap + class GrepAction(Action): + pattern: str = Field(description="Regex to search for") + path: str = Field( + default=".", + description="Directory to search (absolute or relative)" + ) + include: str | None = Field( + default=None, + description="Optional glob to filter files (e.g. '*.py')" + ) + ``` + + + ### Define the Observation + Defines output data (what the tool returns) + + ```python icon="python" wrap + class GrepObservation(Observation): + matches: list[str] = Field(default_factory=list) + files: list[str] = Field(default_factory=list) + count: int = 0 + + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + if not self.count: + return [TextContent(text="No matches found.")] + files_list = "\n".join(f"- {f}" for f in self.files[:20]) + sample = "\n".join(self.matches[:10]) + more = "\n..." if self.count > 10 else "" + ret = ( + f"Found {self.count} matching lines.\n" + f"Files:\n{files_list}\n" + f"Sample:\n{sample}{more}" + ) + return [TextContent(text=ret)] + ``` + + The to_llm_content() property formats observations for the LLM. + + + + ### Define the Executor + Implements the tool’s logic (what the tool does) + + ```python icon="python" wrap + class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): + def __init__(self, terminal: TerminalExecutor): + self.terminal: TerminalExecutor = terminal + + def __call__( + self, + action: GrepAction, + conversation=None, + ) -> GrepObservation: + root = os.path.abspath(action.path) + pat = shlex.quote(action.pattern) + root_q = shlex.quote(root) + + # Use grep -r; add --include when provided + if action.include: + inc = shlex.quote(action.include) + cmd = f"grep -rHnE --include {inc} {pat} {root_q}" + else: + cmd = f"grep -rHnE {pat} {root_q}" + cmd += " 2>/dev/null | head -100" + result = self.terminal(TerminalAction(command=cmd)) + + matches: list[str] = [] + files: set[str] = set() + + # grep returns exit code 1 when no matches; treat as empty + output_text = result.text + + if output_text.strip(): + for line in output_text.strip().splitlines(): + matches.append(line) + # Expect "path:line:content" + # take the file part before first ":" + file_path = line.split(":", 1)[0] + if file_path: + files.add(os.path.abspath(file_path)) + + return GrepObservation( + matches=matches, + files=sorted(files), + count=len(matches), + ) + ``` + + + ### Finally, define the tool + ```python icon="python" wrap + class GrepTool(ToolDefinition[GrepAction, GrepObservation]): + """Custom grep tool that searches file contents using regular expressions.""" + + @classmethod + def create( + cls, + conv_state, + terminal_executor: TerminalExecutor | None = None + ) -> Sequence[ToolDefinition]: + """Create GrepTool instance with a GrepExecutor. + + Args: + conv_state: Conversation state to get + working directory from. + terminal_executor: Optional terminal executor to reuse. + If not provided, a new one will be created. + + Returns: + A sequence containing a single GrepTool instance. + """ + if terminal_executor is None: + terminal_executor = TerminalExecutor( + working_dir=conv_state.workspace.working_dir + ) + grep_executor = GrepExecutor(terminal_executor) + + return [ + cls( + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, + ) + ] + ``` + + + +## Good to know +### Tool Registration +Tools are registered using `register_tool()` and referenced by name: + +```python icon="python" wrap +# Register a simple tool class +register_tool("FileEditorTool", FileEditorTool) + +# Register a factory function that creates multiple tools +register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) + +# Use registered tools by name +tools = [ + Tool(name="FileEditorTool"), + Tool(name="BashAndGrepToolSet"), +] +``` + +### Factory Functions +Tool factory functions receive `conv_state` as a parameter, allowing access to workspace information: + +```python icon="python" wrap +def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: + """Create execute_bash and custom grep tools sharing one executor.""" + bash_executor = BashExecutor( + working_dir=conv_state.workspace.working_dir + ) + # Create and configure tools... + return [bash_tool, grep_tool] +``` + +### Shared Executors +Multiple tools can share executors for efficiency and state consistency: + +```python icon="python" wrap +bash_executor = BashExecutor(working_dir=conv_state.workspace.working_dir) +bash_tool = execute_bash_tool.set_executor(executor=bash_executor) + +grep_executor = GrepExecutor(bash_executor) +grep_tool = ToolDefinition( + name="grep", + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, +) +``` + +## When to Create Custom Tools + +Create custom tools when you need to: +- Combine multiple operations into a single, structured interface +- Add typed parameters with validation +- Format complex outputs for LLM consumption +- Integrate with external APIs or services + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/02_custom_tools.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/02_custom_tools.py) + + +```python icon="python" expandable examples/01_standalone_sdk/02_custom_tools.py +"""Advanced example showing explicit executor usage and custom grep tool.""" + +import os +import shlex +from collections.abc import Sequence + +from pydantic import Field, SecretStr + +from openhands.sdk import ( + LLM, + Action, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Observation, + TextContent, + ToolDefinition, + get_logger, +) +from openhands.sdk.tool import ( + Tool, + ToolExecutor, + register_tool, +) +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import ( + TerminalAction, + TerminalExecutor, + TerminalTool, +) + + +logger = get_logger(__name__) + +# --- Action / Observation --- + + +class GrepAction(Action): + pattern: str = Field(description="Regex to search for") + path: str = Field( + default=".", description="Directory to search (absolute or relative)" + ) + include: str | None = Field( + default=None, description="Optional glob to filter files (e.g. '*.py')" + ) + + +class GrepObservation(Observation): + matches: list[str] = Field(default_factory=list) + files: list[str] = Field(default_factory=list) + count: int = 0 + + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + if not self.count: + return [TextContent(text="No matches found.")] + files_list = "\n".join(f"- {f}" for f in self.files[:20]) + sample = "\n".join(self.matches[:10]) + more = "\n..." if self.count > 10 else "" + ret = ( + f"Found {self.count} matching lines.\n" + f"Files:\n{files_list}\n" + f"Sample:\n{sample}{more}" + ) + return [TextContent(text=ret)] + + +# --- Executor --- + + +class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): + def __init__(self, terminal: TerminalExecutor): + self.terminal: TerminalExecutor = terminal + + def __call__(self, action: GrepAction, conversation=None) -> GrepObservation: # noqa: ARG002 + root = os.path.abspath(action.path) + pat = shlex.quote(action.pattern) + root_q = shlex.quote(root) + + # Use grep -r; add --include when provided + if action.include: + inc = shlex.quote(action.include) + cmd = f"grep -rHnE --include {inc} {pat} {root_q} 2>/dev/null | head -100" + else: + cmd = f"grep -rHnE {pat} {root_q} 2>/dev/null | head -100" + + result = self.terminal(TerminalAction(command=cmd)) + + matches: list[str] = [] + files: set[str] = set() + + # grep returns exit code 1 when no matches; treat as empty + output_text = result.text + + if output_text.strip(): + for line in output_text.strip().splitlines(): + matches.append(line) + # Expect "path:line:content" — take the file part before first ":" + file_path = line.split(":", 1)[0] + if file_path: + files.add(os.path.abspath(file_path)) + + return GrepObservation(matches=matches, files=sorted(files), count=len(matches)) + + +# Tool description +_GREP_DESCRIPTION = """Fast content search tool. +* Searches file contents using regular expressions +* Supports full regex syntax (eg. "log.*Error", "function\\s+\\w+", etc.) +* Filter files by pattern with the include parameter (eg. "*.js", "*.{ts,tsx}") +* Returns matching file paths sorted by modification time. +* Only the first 100 results are returned. Consider narrowing your search with stricter regex patterns or provide path parameter if you need more results. +* Use this tool when you need to find files containing specific patterns +* When you are doing an open ended search that may require multiple rounds of globbing and grepping, use the Agent tool instead +""" # noqa: E501 + + +# --- Tool Definition --- + + +class GrepTool(ToolDefinition[GrepAction, GrepObservation]): + """A custom grep tool that searches file contents using regular expressions.""" + + @classmethod + def create( + cls, conv_state, terminal_executor: TerminalExecutor | None = None + ) -> Sequence[ToolDefinition]: + """Create GrepTool instance with a GrepExecutor. + + Args: + conv_state: Conversation state to get working directory from. + terminal_executor: Optional terminal executor to reuse. If not provided, + a new one will be created. + + Returns: + A sequence containing a single GrepTool instance. + """ + if terminal_executor is None: + terminal_executor = TerminalExecutor( + working_dir=conv_state.workspace.working_dir + ) + grep_executor = GrepExecutor(terminal_executor) + + return [ + cls( + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, + ) + ] + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools - demonstrating both simplified and advanced patterns +cwd = os.getcwd() + + +def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: + """Create terminal and custom grep tools sharing one executor.""" + + terminal_executor = TerminalExecutor(working_dir=conv_state.workspace.working_dir) + # terminal_tool = terminal_tool.set_executor(executor=terminal_executor) + terminal_tool = TerminalTool.create(conv_state, executor=terminal_executor)[0] + + # Use the GrepTool.create() method with shared terminal_executor + grep_tool = GrepTool.create(conv_state, terminal_executor=terminal_executor)[0] + + return [terminal_tool, grep_tool] + + +register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) + +tools = [ + Tool(name=FileEditorTool.name), + Tool(name="BashAndGrepToolSet"), +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message( + "Hello! Can you use the grep tool to find all files " + "containing the word 'class' in this project, then create a summary file listing them? " # noqa: E501 + "Use the pattern 'class' to search and include only Python files with '*.py'." # noqa: E501 +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[Model Context Protocol (MCP) Integration](/sdk/guides/mcp)** - Use Model Context Protocol servers +- **[Tools Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)** - Built-in tools implementation + +### Assign Reviews +Source: https://docs.openhands.dev/sdk/guides/github-workflows/assign-reviews.md + +> The reference workflow is available [here](#reference-workflow)! + +Automate pull request triage by intelligently assigning reviewers based on git blame analysis, notifying reviewers of pending PRs, and prompting authors on stale pull requests. The agent performs three sequential checks: pinging reviewers on clean PRs awaiting review (3+ days), reminding authors on stale PRs (5+ days), and auto-assigning reviewers based on code ownership for unassigned PRs. + +## How it works + +It relies on the basic action workflow (`01_basic_action`) which provides a flexible template for running arbitrary agent tasks in GitHub Actions. + +**Core Components:** +- **`agent_script.py`** - Python script that initializes the OpenHands agent with configurable LLM settings and executes tasks based on provided prompts +- **`workflow.yml`** - GitHub Actions workflow that sets up the environment, installs dependencies, and runs the agent + +**Prompt Options:** +1. **`PROMPT_STRING`** - Direct inline text for simple prompts (used in this example) +2. **`PROMPT_LOCATION`** - URL or file path for external prompts + +The workflow downloads the agent script, validates configuration, runs the task, and uploads execution logs as artifacts. + +## Assign Reviews Use Case + +This specific implementation uses the basic action template to handle three PR management scenarios: + +**1. Need Reviewer Action** +- Identifies PRs waiting for review +- Notifies reviewers to take action + +**2. Need Author Action** +- Finds stale PRs with no activity for 5+ days +- Prompts authors to update, request review, or close + +**3. Need Reviewers** +- Detects non-draft PRs without assigned reviewers (created 1+ day ago, CI passing) +- Uses git blame analysis to identify relevant contributors +- Automatically assigns reviewers based on file ownership and contribution history +- Balances reviewer workload across team members + +## Quick Start + + + + ```bash icon="terminal" + cp examples/03_github_workflows/01_basic_action/assign-reviews.yml .github/workflows/assign-reviews.yml + ``` + + + Go to `GitHub Settings → Secrets → Actions`, and add `LLM_API_KEY` + (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). + + + Go to `GitHub Settings → Actions → General → Workflow permissions` and enable "Read and write permissions". + + + The default is: Daily at 12 PM UTC. + + + +## Features + +- **Intelligent Assignment** - Uses git blame to identify relevant reviewers based on code ownership +- **Automated Notifications** - Sends contextual reminders to reviewers and authors +- **Workload Balancing** - Distributes review requests evenly across team members +- **Scheduled & Manual** - Runs daily automatically or on-demand via workflow dispatch + +## Reference Workflow + + +This example is available on GitHub: [examples/03_github_workflows/01_basic_action/assign-reviews.yml](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) + + +```yaml icon="yaml" expandable examples/03_github_workflows/01_basic_action/assign-reviews.yml +--- +# To set this up: +# 1. Change the name below to something relevant to your task +# 2. Modify the "env" section below with your prompt +# 3. Add your LLM_API_KEY to the repository secrets +# 4. Commit this file to your repository +# 5. Trigger the workflow manually or set up a schedule +name: Assign Reviews + +on: + # Manual trigger + workflow_dispatch: + # Scheduled trigger (disabled by default, uncomment and customize as needed) + schedule: + # Run at 12 PM UTC every day + - cron: 0 12 * * * + +permissions: + contents: write + pull-requests: write + issues: write + +jobs: + run-task: + runs-on: ubuntu-24.04 + env: + # Configuration (modify these values as needed) + AGENT_SCRIPT_URL: https://raw.githubusercontent.com/OpenHands/agent-sdk/main/examples/03_github_workflows/01_basic_action/agent_script.py + # Provide either PROMPT_LOCATION (URL/file) OR PROMPT_STRING (direct text), not both + # Option 1: Use a URL or file path for the prompt + PROMPT_LOCATION: '' + # PROMPT_LOCATION: 'https://example.com/prompts/maintenance.txt' + # Option 2: Use direct text for the prompt + PROMPT_STRING: > + Use GITHUB_TOKEN and the github API to organize open pull requests and issues in the repo. + Read the sections below in order, and perform each in order. Do NOT take action + on the same issue or PR twice. + + # Issues with needs-info - Check for OP Response + + Find all open issues that have the "needs-info" label. For each issue: + 1. Identify the original poster (issue author) + 2. Check if there are any comments from the original poster AFTER the "needs-info" label was added + 3. To determine when the label was added, use: GET /repos/{owner}/{repo}/issues/{issue_number}/timeline + and look for "labeled" events with the label "needs-info" + 4. If the original poster has commented after the label was added: + - Remove the "needs-info" label + - Add the "needs-triage" label + - Post a comment: "[Automatic Post]: The issue author has provided additional information. Moving back to needs-triage for review." + + # Issues with needs-triage + + Find all open issues that have the "needs-triage" label. For each issue that has been in this state for more than 4 days since the last + activity: + 1. First, check if the issue has already been triaged by verifying it does NOT have: + - The "enhancement" label + - Any "priority" label (priority:low, priority:medium, priority:high, etc.) + 2. If the issue has already been triaged (has enhancement or priority label), remove the needs-triage label + 3. For issues that have NOT been triaged yet: + - Read the issue description and comments + - Determine if it requires maintainer attention by checking: + * Is it a bug report, feature request, or question? + * Does it have enough information to be actionable? + * Has a maintainer already commented? + * Is the last comment older than 4 days? + - If it needs maintainer attention and no maintainer has commented: + * Find an appropriate maintainer based on the issue topic and recent activity + * Tag them with: "[Automatic Post]: This issue has been waiting for triage. @{maintainer}, could you please take a look when you have + a chance?" + + # Need Reviewer Action + + Find all open PRs where: + 1. The PR is waiting for review (there are no open review comments or change requests) + 2. The PR is in a "clean" state (CI passing, no merge conflicts) + 3. The PR is not marked as draft (draft: false) + 4. The PR has had no activity (comments, commits, reviews) for more than 3 days. + + In this case, send a message to the reviewers: + [Automatic Post]: This PR seems to be currently waiting for review. + {reviewer_names}, could you please take a look when you have a chance? + + # Need Author Action + + Find all open PRs where the most recent change or comment was made on the pull + request more than 5 days ago (use 14 days if the PR is marked as draft). + + And send a message to the author: + + [Automatic Post]: It has been a while since there was any activity on this PR. + {author}, are you still working on it? If so, please go ahead, if not then + please request review, close it, or request that someone else follow up. + + # Need Reviewers + + Find all open pull requests that: + 1. Have no reviewers assigned to them. + 2. Are not marked as draft. + 3. Were created more than 1 day ago. + 4. CI is passing and there are no merge conflicts. + + For each of these pull requests, read the git blame information for the files, + and find the most recent and active contributors to the file/location of the changes. + Assign one of these people as a reviewer, but try not to assign too many reviews to + any single person. Add this message: + + [Automatic Post]: I have assigned {reviewer} as a reviewer based on git blame information. + Thanks in advance for the help! + + LLM_MODEL: + LLM_BASE_URL: + steps: + - name: Checkout repository + uses: actions/checkout@v5 + + - name: Set up Python + uses: actions/setup-python@v6 + with: + python-version: '3.13' + + - name: Install uv + uses: astral-sh/setup-uv@v7 + with: + enable-cache: true + + - name: Install OpenHands dependencies + run: | + # Install OpenHands SDK and tools from git repository + uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" + uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" + + - name: Check required configuration + env: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + run: | + if [ -z "$LLM_API_KEY" ]; then + echo "Error: LLM_API_KEY secret is not set." + exit 1 + fi + + # Check that exactly one of PROMPT_LOCATION or PROMPT_STRING is set + if [ -n "$PROMPT_LOCATION" ] && [ -n "$PROMPT_STRING" ]; then + echo "Error: Both PROMPT_LOCATION and PROMPT_STRING are set." + echo "Please provide only one in the env section of the workflow file." + exit 1 + fi + + if [ -z "$PROMPT_LOCATION" ] && [ -z "$PROMPT_STRING" ]; then + echo "Error: Neither PROMPT_LOCATION nor PROMPT_STRING is set." + echo "Please set one in the env section of the workflow file." + exit 1 + fi + + if [ -n "$PROMPT_LOCATION" ]; then + echo "Prompt location: $PROMPT_LOCATION" + else + echo "Using inline PROMPT_STRING (${#PROMPT_STRING} characters)" + fi + echo "LLM model: $LLM_MODEL" + if [ -n "$LLM_BASE_URL" ]; then + echo "LLM base URL: $LLM_BASE_URL" + fi + + - name: Run task + env: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + PYTHONPATH: '' + run: | + echo "Running agent script: $AGENT_SCRIPT_URL" + + # Download script if it's a URL + if [[ "$AGENT_SCRIPT_URL" =~ ^https?:// ]]; then + echo "Downloading agent script from URL..." + curl -sSL "$AGENT_SCRIPT_URL" -o /tmp/agent_script.py + AGENT_SCRIPT_PATH="/tmp/agent_script.py" + else + AGENT_SCRIPT_PATH="$AGENT_SCRIPT_URL" + fi + + # Run with appropriate prompt argument + if [ -n "$PROMPT_LOCATION" ]; then + echo "Using prompt from: $PROMPT_LOCATION" + uv run python "$AGENT_SCRIPT_PATH" "$PROMPT_LOCATION" + else + echo "Using PROMPT_STRING (${#PROMPT_STRING} characters)" + uv run python "$AGENT_SCRIPT_PATH" + fi + + - name: Upload logs as artifact + uses: actions/upload-artifact@v4 + if: always() + with: + name: openhands-task-logs + path: | + *.log + output/ + retention-days: 7 +``` + +## Related Files + +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) +- [Basic Action README](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/README.md) + +### PR Review +Source: https://docs.openhands.dev/sdk/guides/github-workflows/pr-review.md + +> The reference workflow is available [here](#reference-workflow)! + +Automatically review pull requests, providing feedback on code quality, security, and best practices. Reviews can be triggered in two ways: +- Requesting `openhands-agent` as a reviewer +- Adding the `review-this` label to the PR + + +The reference workflow triggers on either the "review-this" label or when the openhands-agent account is requested as a reviewer. In OpenHands organization repositories, openhands-agent has access, so this works as-is. In your own repositories, requesting openhands-agent will only work if that account is added as a collaborator or is part of a team with access. If you don't plan to grant access, use the label trigger instead, or change the condition to a reviewer handle that exists in your repo. + + +## Quick Start + +```bash +# 1. Copy workflow to your repository +cp examples/03_github_workflows/02_pr_review/workflow.yml .github/workflows/pr-review.yml + +# 2. Configure secrets in GitHub Settings → Secrets +# Add: LLM_API_KEY + +# 3. (Optional) Create a "review-this" label in your repository +# Go to Issues → Labels → New label +# You can also trigger reviews by requesting "openhands-agent" as a reviewer +``` + +## Features + +- **Fast Reviews** - Results posted on the PR in only 2 or 3 minutes +- **Comprehensive Analysis** - Analyzes the changes given the repository context. Covers code quality, security, best practices +- **GitHub Integration** - Posts comments directly to the PR +- **Customizable** - Add your own code review guidelines without forking + +## Security + +- Users with write access (maintainers) can trigger reviews by requesting `openhands-agent` as a reviewer or adding the `review-this` label. +- Maintainers need to read the PR to make sure it's safe to run. + +## Customizing the Code Review + +Instead of forking the `agent_script.py`, you can customize the code review behavior by adding a skill file to your repository. This is the **recommended approach** for customization. + +### How It Works + +The PR review agent uses skills from the [OpenHands/extensions](https://github.com/OpenHands/extensions) repository by default. You can add your project-specific guidelines alongside the default skill by creating a custom skill file. + + +**Skill paths**: Place skills in `.agents/skills/` (recommended). The legacy path `.openhands/skills/` is also supported. See [Skill Loading Precedence](/overview/skills#skill-loading-precedence) for details. + + +### Example: Custom Code Review Skill + +Create `.agents/skills/custom-codereview-guide.md` in your repository: + +```markdown +--- +name: custom-codereview-guide +description: Project-specific review guidelines for MyProject +triggers: +- /codereview +--- + +# MyProject-Specific Review Guidelines + +In addition to general code review practices, check for: + +## Project Conventions + +- All API endpoints must have OpenAPI documentation +- Database migrations must be reversible +- Feature flags required for new features + +## Architecture Rules + +- No direct database access from controllers +- All external API calls must go through the gateway service + +## Communication Style + +- Be direct and constructive +- Use GitHub suggestion syntax for code fixes +``` + + +**Note**: These rules supplement the default `code-review` skill, not replace it. + + + +**How skill merging works**: Using a unique name like `custom-codereview-guide` allows BOTH your custom skill AND the default `code-review` skill to be triggered by `/codereview`. When triggered, skill content is concatenated into the agent's context (public skills first, then your custom skills). There is no smart merging—if guidelines conflict, the agent sees both and must reconcile them. + +If your skill has `name: code-review` (matching the public skill's name), it will completely **override** the default public skill instead of supplementing it. + + + +**Migrating from override to supplement**: If you previously created a skill with `name: code-review` to override the default, rename it (e.g., to `my-project-review`) to receive guidelines from both skills instead. + + +### Benefits of Custom Skills + +1. **No forking required**: Keep using the official SDK while customizing behavior +2. **Version controlled**: Your review guidelines live in your repository +3. **Easy updates**: SDK updates don't overwrite your customizations +4. **Team alignment**: Everyone uses the same review standards +5. **Composable**: Add project-specific rules alongside default guidelines + + +See the [software-agent-sdk's own custom-codereview-guide skill](https://github.com/OpenHands/software-agent-sdk/blob/main/.agents/skills/custom-codereview-guide.md) for a complete example. + + +## Reference Workflow + + +This example is available on GitHub: [examples/03_github_workflows/02_pr_review/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) + + +```yaml icon="yaml" expandable examples/03_github_workflows/02_pr_review/workflow.yml +--- +# OpenHands PR Review Workflow +# +# To set this up: +# 1. Copy this file to .github/workflows/pr-review.yml in your repository +# 2. Add LLM_API_KEY to repository secrets +# 3. Customize the inputs below as needed +# 4. Commit this file to your repository +# 5. Trigger the review by either: +# - Adding the "review-this" label to any PR, OR +# - Requesting openhands-agent as a reviewer +# +# For more information, see: +# https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review +name: PR Review by OpenHands + +on: + # Trigger when a label is added or a reviewer is requested + pull_request: + types: [labeled, review_requested] + +permissions: + contents: read + pull-requests: write + issues: write + +jobs: + pr-review: + # Run when review-this label is added OR openhands-agent is requested as reviewer + if: | + github.event.label.name == 'review-this' || + github.event.requested_reviewer.login == 'openhands-agent' + runs-on: ubuntu-latest + steps: + - name: Checkout for composite action + uses: actions/checkout@v4 + with: + repository: OpenHands/software-agent-sdk + # Use a specific version tag or branch (e.g., 'v1.0.0' or 'main') + ref: main + sparse-checkout: .github/actions/pr-review + + - name: Run PR Review + uses: ./.github/actions/pr-review + with: + # LLM model(s) to use. Can be comma-separated for A/B testing + # - one model will be randomly selected per review + llm-model: anthropic/claude-sonnet-4-5-20250929 + llm-base-url: '' + # Review style: roasted (other option: standard) + review-style: roasted + # SDK version to use (version tag or branch name) + sdk-version: main + # Secrets + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} +``` + +### Action Inputs + +| Input | Description | Required | Default | +|-------|-------------|----------|---------| +| `llm-model` | LLM model to use | Yes | - | +| `llm-base-url` | LLM base URL (optional) | No | `''` | +| `review-style` | Review style: 'standard' or 'roasted' | No | `roasted` | +| `sdk-version` | Git ref for SDK (tag, branch, or commit SHA) | No | `main` | +| `sdk-repo` | SDK repository (owner/repo) | No | `OpenHands/software-agent-sdk` | +| `llm-api-key` | LLM API key | Yes | - | +| `github-token` | GitHub token for API access | Yes | - | + +## Related Files + +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/agent_script.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/workflow.yml) +- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/prompt.py) +- [Composite Action](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/actions/pr-review/action.yml) + +### TODO Management +Source: https://docs.openhands.dev/sdk/guides/github-workflows/todo-management.md + +> The reference workflow is available [here](#reference-workflow)! + + +Scan your codebase for TODO comments and let the OpenHands Agent implement them, creating a pull request for each TODO and picking relevant reviewers based on code changes and file ownership + +## Quick Start + + + + ```bash icon="terminal" + cp examples/03_github_workflows/03_todo_management/workflow.yml .github/workflows/todo-management.yml + ``` + + + Go to `GitHub Settings → Secrets` and add `LLM_API_KEY` + (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). + + + Go to `Settings → Actions → General → Workflow permissions` and enable: + - `Read and write permissions` + - `Allow GitHub Actions to create and approve pull requests` + + + Trigger the agent by adding TODO comments into your code. + + Example: `# TODO(openhands): Add input validation for user email` + + + The workflow is configurable and any identifier can be used in place of `TODO(openhands)` + + + + + +## Features + +- **Scanning** - Finds matching TODO comments with configurable identifiers and extracts the TODO description. +- **Implementation** - Sends the TODO description to the OpenHands Agent that automatically implements it +- **PR Management** - Creates feature branches, pull requests and picks most relevant reviewers + +## Best Practices + +- **Start Small** - Begin with `MAX_TODOS: 1` to test the workflow +- **Clear Descriptions** - Write descriptive TODO comments +- **Review PRs** - Always review the generated PRs before merging + +## Reference Workflow + + +This example is available on GitHub: [examples/03_github_workflows/03_todo_management/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/03_todo_management) + + +```yaml icon="yaml" expandable examples/03_github_workflows/03_todo_management/workflow.yml +--- +# Automated TODO Management Workflow +# Make sure to replace and with +# appropriate values for your LLM setup. +# +# This workflow automatically scans for TODO(openhands) comments and creates +# pull requests to implement them using the OpenHands agent. +# +# Setup: +# 1. Add LLM_API_KEY to repository secrets +# 2. Ensure GITHUB_TOKEN has appropriate permissions +# 3. Make sure Github Actions are allowed to create and review PRs +# 4. Commit this file to .github/workflows/ in your repository +# 5. Configure the schedule or trigger manually + +name: Automated TODO Management + +on: + # Manual trigger + workflow_dispatch: + inputs: + max_todos: + description: Maximum number of TODOs to process in this run + required: false + default: '3' + type: string + todo_identifier: + description: TODO identifier to search for (e.g., TODO(openhands)) + required: false + default: TODO(openhands) + type: string + + # Trigger when 'automatic-todo' label is added to a PR + pull_request: + types: [labeled] + + # Scheduled trigger (disabled by default, uncomment and customize as needed) + # schedule: + # # Run every Monday at 9 AM UTC + # - cron: "0 9 * * 1" + +permissions: + contents: write + pull-requests: write + issues: write + +jobs: + scan-todos: + runs-on: ubuntu-latest + # Only run if triggered manually or if 'automatic-todo' label was added + if: > + github.event_name == 'workflow_dispatch' || + (github.event_name == 'pull_request' && + github.event.label.name == 'automatic-todo') + outputs: + todos: ${{ steps.scan.outputs.todos }} + todo-count: ${{ steps.scan.outputs.todo-count }} + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 # Full history for better context + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.13' + + - name: Copy TODO scanner + run: | + cp examples/03_github_workflows/03_todo_management/scanner.py /tmp/scanner.py + chmod +x /tmp/scanner.py + + - name: Scan for TODOs + id: scan + run: | + echo "Scanning for TODO comments..." + + # Run the scanner and capture output + TODO_IDENTIFIER="${{ github.event.inputs.todo_identifier || 'TODO(openhands)' }}" + python /tmp/scanner.py . --identifier "$TODO_IDENTIFIER" > todos.json + + # Count TODOs + TODO_COUNT=$(python -c \ + "import json; data=json.load(open('todos.json')); print(len(data))") + echo "Found $TODO_COUNT $TODO_IDENTIFIER items" + + # Limit the number of TODOs to process + MAX_TODOS="${{ github.event.inputs.max_todos || '3' }}" + if [ "$TODO_COUNT" -gt "$MAX_TODOS" ]; then + echo "Limiting to first $MAX_TODOS TODOs" + python -c " + import json + data = json.load(open('todos.json')) + limited = data[:$MAX_TODOS] + json.dump(limited, open('todos.json', 'w'), indent=2) + " + TODO_COUNT=$MAX_TODOS + fi + + # Set outputs + echo "todos=$(cat todos.json | jq -c .)" >> $GITHUB_OUTPUT + echo "todo-count=$TODO_COUNT" >> $GITHUB_OUTPUT + + # Display found TODOs + echo "## 📋 Found TODOs" >> $GITHUB_STEP_SUMMARY + if [ "$TODO_COUNT" -eq 0 ]; then + echo "No TODO(openhands) comments found." >> $GITHUB_STEP_SUMMARY + else + echo "Found $TODO_COUNT TODO(openhands) items:" \ + >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + python -c " + import json + data = json.load(open('todos.json')) + for i, todo in enumerate(data, 1): + print(f'{i}. **{todo[\"file\"]}:{todo[\"line\"]}** - ' + + f'{todo[\"description\"]}') + " >> $GITHUB_STEP_SUMMARY + fi + + process-todos: + needs: scan-todos + if: needs.scan-todos.outputs.todo-count > 0 + runs-on: ubuntu-latest + strategy: + matrix: + todo: ${{ fromJson(needs.scan-todos.outputs.todos) }} + max-parallel: 1 # Process one TODO at a time to avoid conflicts + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 + token: ${{ secrets.GITHUB_TOKEN }} + + - name: Switch to feature branch with TODO management files + run: | + git checkout openhands/todo-management-example + git pull origin openhands/todo-management-example + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.13' + + - name: Install uv + uses: astral-sh/setup-uv@v6 + with: + enable-cache: true + + - name: Install OpenHands dependencies + run: | + # Install OpenHands SDK and tools from git repository + uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" + uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" + + - name: Copy agent files + run: | + cp examples/03_github_workflows/03_todo_management/agent_script.py agent.py + cp examples/03_github_workflows/03_todo_management/prompt.py prompt.py + chmod +x agent.py + + - name: Configure Git + run: | + git config --global user.name "openhands-bot" + git config --global user.email \ + "openhands-bot@users.noreply.github.com" + + - name: Process TODO + env: + LLM_MODEL: + LLM_BASE_URL: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_REPOSITORY: ${{ github.repository }} + TODO_FILE: ${{ matrix.todo.file }} + TODO_LINE: ${{ matrix.todo.line }} + TODO_DESCRIPTION: ${{ matrix.todo.description }} + PYTHONPATH: '' + run: | + echo "Processing TODO: $TODO_DESCRIPTION" + echo "File: $TODO_FILE:$TODO_LINE" + + # Create a unique branch name for this TODO + BRANCH_NAME="todo/$(echo "$TODO_DESCRIPTION" | \ + sed 's/[^a-zA-Z0-9]/-/g' | \ + sed 's/--*/-/g' | \ + sed 's/^-\|-$//g' | \ + tr '[:upper:]' '[:lower:]' | \ + cut -c1-50)" + echo "Branch name: $BRANCH_NAME" + + # Create and switch to new branch (force create if exists) + git checkout -B "$BRANCH_NAME" + + # Run the agent to process the TODO + # Stay in repository directory for git operations + + # Create JSON payload for the agent + TODO_JSON=$(cat <&1 | tee agent_output.log + AGENT_EXIT_CODE=$? + set -e + + echo "Agent exit code: $AGENT_EXIT_CODE" + echo "Agent output log:" + cat agent_output.log + + # Show files in working directory + echo "Files in working directory:" + ls -la + + # If agent failed, show more details + if [ $AGENT_EXIT_CODE -ne 0 ]; then + echo "Agent failed with exit code $AGENT_EXIT_CODE" + echo "Last 50 lines of agent output:" + tail -50 agent_output.log + exit $AGENT_EXIT_CODE + fi + + # Check if any changes were made + cd "$GITHUB_WORKSPACE" + if git diff --quiet; then + echo "No changes made by agent, skipping PR creation" + exit 0 + fi + + # Commit changes + git add -A + git commit -m "Implement TODO: $TODO_DESCRIPTION + + Automatically implemented by OpenHands agent. + + Co-authored-by: openhands " + + # Push branch + git push origin "$BRANCH_NAME" + + # Create pull request + PR_TITLE="Implement TODO: $TODO_DESCRIPTION" + PR_BODY="## 🤖 Automated TODO Implementation + + This PR automatically implements the following TODO: + + **File:** \`$TODO_FILE:$TODO_LINE\` + **Description:** $TODO_DESCRIPTION + + ### Implementation + The OpenHands agent has analyzed the TODO and implemented the + requested functionality. + + ### Review Notes + - Please review the implementation for correctness + - Test the changes in your development environment + - The original TODO comment will be updated with this PR URL + once merged + + --- + *This PR was created automatically by the TODO Management workflow.*" + + # Create PR using GitHub CLI or API + curl -X POST \ + -H "Authorization: token $GITHUB_TOKEN" \ + -H "Accept: application/vnd.github.v3+json" \ + "https://api.github.com/repos/${{ github.repository }}/pulls" \ + -d "{ + \"title\": \"$PR_TITLE\", + \"body\": \"$PR_BODY\", + \"head\": \"$BRANCH_NAME\", + \"base\": \"${{ github.ref_name }}\" + }" + + summary: + needs: [scan-todos, process-todos] + if: always() + runs-on: ubuntu-latest + steps: + - name: Generate Summary + run: | + echo "# 🤖 TODO Management Summary" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + + TODO_COUNT="${{ needs.scan-todos.outputs.todo-count || '0' }}" + echo "**TODOs Found:** $TODO_COUNT" >> $GITHUB_STEP_SUMMARY + + if [ "$TODO_COUNT" -gt 0 ]; then + echo "**Processing Status:** ✅ Completed" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + echo "Check the pull requests created for each TODO" \ + "implementation." >> $GITHUB_STEP_SUMMARY + else + echo "**Status:** ℹ️ No TODOs found to process" \ + >> $GITHUB_STEP_SUMMARY + fi + + echo "" >> $GITHUB_STEP_SUMMARY + echo "---" >> $GITHUB_STEP_SUMMARY + echo "*Workflow completed at $(date)*" >> $GITHUB_STEP_SUMMARY +``` + +## Related Documentation + +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/agent_script.py) +- [Scanner Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/scanner.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/workflow.yml) +- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/prompt.py) + +### Hello World +Source: https://docs.openhands.dev/sdk/guides/hello-world.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## Your First Agent + +This is the most basic example showing how to set up and run an OpenHands agent. + + + + ### LLM Configuration + + Configure the language model that will power your agent: + ```python icon="python" + llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, # Optional + service_id="agent" + ) + ``` + + + ### Select an Agent + Use the preset agent with common built-in tools: + ```python icon="python" + agent = get_default_agent(llm=llm, cli_mode=True) + ``` + The default agent includes `BashTool`, `FileEditorTool`, etc. + + For the complete list of available tools see the + [tools package source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools). + + + + + ### Start a Conversation + Start a conversation to manage the agent's lifecycle: + ```python icon="python" + conversation = Conversation(agent=agent, workspace=cwd) + conversation.send_message( + "Write 3 facts about the current project into FACTS.txt." + ) + conversation.run() + ``` + + + ### Expected Behavior + When you run this example: + 1. The agent analyzes the current directory + 2. Gathers information about the project + 3. Creates `FACTS.txt` with 3 relevant facts + 4. Completes and exits + + Example output file: + + ```text icon="text" wrap + FACTS.txt + --------- + 1. This is a Python project using the OpenHands Software Agent SDK. + 2. The project includes examples demonstrating various agent capabilities. + 3. The SDK provides tools for file manipulation, bash execution, and more. + ``` + + + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/01_hello_world.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py) + + +```python icon="python" wrap expandable examples/01_standalone_sdk/01_hello_world.py +import os + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +llm = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL", None), +) + +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], +) + +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) + +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("All done!") +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create custom tools for specialized needs +- **[Model Context Protocol (MCP)](/sdk/guides/mcp)** - Integrate external MCP servers +- **[Security Analyzer](/sdk/guides/security)** - Add security validation to tool usage + +### Hooks +Source: https://docs.openhands.dev/sdk/guides/hooks.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## Overview + +Hooks let you observe and customize key lifecycle moments in the SDK without forking core code. Typical uses include: +- Logging and analytics +- Emitting custom metrics +- Auditing or compliance +- Tracing and debugging + +## Hook Types + +| Hook | When it runs | Can block? | +|------|--------------|------------| +| PreToolUse | Before tool execution | Yes (exit 2) | +| PostToolUse | After tool execution | No | +| UserPromptSubmit | Before processing user message | Yes (exit 2) | +| Stop | When agent tries to finish | Yes (exit 2) | +| SessionStart | When conversation starts | No | +| SessionEnd | When conversation ends | No | + +## Key Concepts + +- Registration points: subscribe to events or attach pre/post hooks around LLM calls and tool execution +- Isolation: hooks run outside the agent loop logic, avoiding core modifications +- Composition: enable or disable hooks per environment (local vs. prod) + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/33_hooks](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/33_hooks/) + + +```python icon="python" expandable examples/01_standalone_sdk/33_hooks/33_hooks.py +"""OpenHands Agent SDK — Hooks Example + +Demonstrates the OpenHands hooks system. +Hooks are shell scripts that run at key lifecycle events: + +- PreToolUse: Block dangerous commands before execution +- PostToolUse: Log tool usage after execution +- UserPromptSubmit: Inject context into user messages +- Stop: Enforce task completion criteria + +The hook scripts are in the scripts/ directory alongside this file. +""" + +import os +import signal +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation +from openhands.sdk.hooks import HookConfig, HookDefinition, HookMatcher +from openhands.tools.preset.default import get_default_agent + + +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + +SCRIPT_DIR = Path(__file__).parent / "hook_scripts" + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Create temporary workspace with git repo +with tempfile.TemporaryDirectory() as tmpdir: + workspace = Path(tmpdir) + os.system(f"cd {workspace} && git init -q && echo 'test' > file.txt") + + log_file = workspace / "tool_usage.log" + summary_file = workspace / "summary.txt" + + # Configure hooks using the typed approach (recommended) + # This provides better type safety and IDE support + hook_config = HookConfig( + pre_tool_use=[ + HookMatcher( + matcher="terminal", + hooks=[ + HookDefinition( + command=str(SCRIPT_DIR / "block_dangerous.sh"), + timeout=10, + ) + ], + ) + ], + post_tool_use=[ + HookMatcher( + matcher="*", + hooks=[ + HookDefinition( + command=(f"LOG_FILE={log_file} {SCRIPT_DIR / 'log_tools.sh'}"), + timeout=5, + ) + ], + ) + ], + user_prompt_submit=[ + HookMatcher( + hooks=[ + HookDefinition( + command=str(SCRIPT_DIR / "inject_git_context.sh"), + ) + ], + ) + ], + stop=[ + HookMatcher( + hooks=[ + HookDefinition( + command=( + f"SUMMARY_FILE={summary_file} " + f"{SCRIPT_DIR / 'require_summary.sh'}" + ), + ) + ], + ) + ], + ) + + # Alternative: You can also use .from_dict() for loading from JSON config files + # Example with a single hook matcher: + # hook_config = HookConfig.from_dict({ + # "hooks": { + # "PreToolUse": [{ + # "matcher": "terminal", + # "hooks": [{"command": "path/to/script.sh", "timeout": 10}] + # }] + # } + # }) + + agent = get_default_agent(llm=llm) + conversation = Conversation( + agent=agent, + workspace=str(workspace), + hook_config=hook_config, + ) + + # Demo 1: Safe command (PostToolUse logs it) + print("=" * 60) + print("Demo 1: Safe command - logged by PostToolUse") + print("=" * 60) + conversation.send_message("Run: echo 'Hello from hooks!'") + conversation.run() + + if log_file.exists(): + print(f"\n[Log: {log_file.read_text().strip()}]") + + # Demo 2: Dangerous command (PreToolUse blocks it) + print("\n" + "=" * 60) + print("Demo 2: Dangerous command - blocked by PreToolUse") + print("=" * 60) + conversation.send_message("Run: rm -rf /tmp/test") + conversation.run() + + # Demo 3: Context injection + Stop hook enforcement + print("\n" + "=" * 60) + print("Demo 3: Context injection + Stop hook") + print("=" * 60) + print("UserPromptSubmit injects git status; Stop requires summary.txt\n") + conversation.send_message( + "Check what files have changes, then create summary.txt describing the repo." + ) + conversation.run() + + if summary_file.exists(): + print(f"\n[summary.txt: {summary_file.read_text()[:80]}...]") + + print("\n" + "=" * 60) + print("Example Complete!") + print("=" * 60) + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") +``` + + + +### Hook Scripts + +The example uses external hook scripts in the `hook_scripts/` directory: + + +```bash +#!/bin/bash +# PreToolUse hook: Block dangerous rm -rf commands +# Uses jq for JSON parsing (needed for nested fields like tool_input.command) + +input=$(cat) +command=$(echo "$input" | jq -r '.tool_input.command // ""') + +# Block rm -rf commands +if [[ "$command" =~ "rm -rf" ]]; then + echo '{"decision": "deny", "reason": "rm -rf commands are blocked for safety"}' + exit 2 # Exit code 2 = block the operation +fi + +exit 0 # Exit code 0 = allow the operation +``` + + + +```bash +#!/bin/bash +# PostToolUse hook: Log all tool usage +# Uses OPENHANDS_TOOL_NAME env var (no jq/python needed!) + +# LOG_FILE should be set by the calling script +LOG_FILE="${LOG_FILE:-/tmp/tool_usage.log}" + +echo "[$(date)] Tool used: $OPENHANDS_TOOL_NAME" >> "$LOG_FILE" +exit 0 +``` + + + +```bash +#!/bin/bash +# UserPromptSubmit hook: Inject git status when user asks about code changes + +input=$(cat) + +# Check if user is asking about changes, diff, or git +if echo "$input" | grep -qiE "(changes|diff|git|commit|modified)"; then + # Get git status if in a git repo + if git rev-parse --git-dir > /dev/null 2>&1; then + status=$(git status --short 2>/dev/null | head -10) + if [ -n "$status" ]; then + # Escape for JSON + escaped=$(echo "$status" | sed 's/"/\\"/g' | tr '\n' ' ') + echo "{\"additionalContext\": \"Current git status: $escaped\"}" + fi + fi +fi +exit 0 +``` + + + +```bash +#!/bin/bash +# Stop hook: Require a summary.txt file before allowing agent to finish +# SUMMARY_FILE should be set by the calling script + +SUMMARY_FILE="${SUMMARY_FILE:-./summary.txt}" + +if [ ! -f "$SUMMARY_FILE" ]; then + echo '{"decision": "deny", "additionalContext": "Create summary.txt first."}' + exit 2 +fi +exit 0 +``` + + + +## Next Steps + +- See also: [Metrics and Observability](/sdk/guides/metrics) +- Architecture: [Events](/sdk/arch/events) + +### Iterative Refinement +Source: https://docs.openhands.dev/sdk/guides/iterative-refinement.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> The ready-to-run example is available [here](#ready-to-run-example)! + +## Overview + +Iterative refinement is a powerful pattern where multiple agents work together in a feedback loop: +1. A **refactoring agent** performs the main task (e.g., code conversion) +2. A **critique agent** evaluates the quality and provides detailed feedback +3. If quality is below threshold, the refactoring agent tries again with the feedback + +This pattern is useful for: +- Code refactoring and modernization (e.g., COBOL to Java) +- Document translation and localization +- Content generation with quality requirements +- Any task requiring iterative improvement + +## How It Works + +### The Iteration Loop + +The core workflow runs in a loop until quality threshold is met: + +```python icon="python" wrap +QUALITY_THRESHOLD = 90.0 +MAX_ITERATIONS = 5 + +while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + # Phase 1: Refactoring agent converts COBOL to Java + refactoring_agent = get_default_agent(llm=llm, cli_mode=True) + refactoring_conversation = Conversation( + agent=refactoring_agent, + workspace=str(workspace_dir) + ) + refactoring_conversation.send_message(refactoring_prompt) + refactoring_conversation.run() + + # Phase 2: Critique agent evaluates the conversion + critique_agent = get_default_agent(llm=llm, cli_mode=True) + critique_conversation = Conversation( + agent=critique_agent, + workspace=str(workspace_dir) + ) + critique_conversation.send_message(critique_prompt) + critique_conversation.run() + + # Parse score and decide whether to continue + current_score = parse_critique_score(critique_file) + + iteration += 1 +``` + +### Critique Scoring + +The critique agent evaluates each file on four dimensions (0-25 pts each): +- **Correctness**: Does the Java code preserve the original business logic? +- **Code Quality**: Is the code clean and following Java conventions? +- **Completeness**: Are all COBOL features properly converted? +- **Best Practices**: Does it use proper OOP, error handling, and documentation? + +### Feedback Loop + +When the score is below threshold, the refactoring agent receives the critique file location: + +```python icon="python" wrap +if critique_file and critique_file.exists(): + base_prompt += f""" +IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. +Please review the critique at: {critique_file} +Address all issues mentioned in the critique to improve the conversion quality. +""" +``` + +## Customization + +### Adjusting Thresholds + +```python icon="python" wrap +QUALITY_THRESHOLD = 95.0 # Require higher quality +MAX_ITERATIONS = 10 # Allow more iterations +``` + +### Using Real COBOL Files + +The example uses sample files, but you can use real files from the [AWS CardDemo project](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl). + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/31_iterative_refinement.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/31_iterative_refinement.py) + + +```python icon="python" expandable examples/01_standalone_sdk/31_iterative_refinement.py +#!/usr/bin/env python3 +""" +Iterative Refinement Example: COBOL to Java Refactoring + +This example demonstrates an iterative refinement workflow where: +1. A refactoring agent converts COBOL files to Java files +2. A critique agent evaluates the quality of each conversion and provides scores +3. If the average score is below 90%, the process repeats with feedback + +The workflow continues until the refactoring meets the quality threshold. + +Source COBOL files can be obtained from: +https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl +""" + +import os +import re +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation +from openhands.tools.preset.default import get_default_agent + + +QUALITY_THRESHOLD = float(os.getenv("QUALITY_THRESHOLD", "90.0")) +MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "5")) + + +def setup_workspace() -> tuple[Path, Path, Path]: + """Create workspace directories for the refactoring workflow.""" + workspace_dir = Path(tempfile.mkdtemp()) + cobol_dir = workspace_dir / "cobol" + java_dir = workspace_dir / "java" + critique_dir = workspace_dir / "critiques" + + cobol_dir.mkdir(parents=True, exist_ok=True) + java_dir.mkdir(parents=True, exist_ok=True) + critique_dir.mkdir(parents=True, exist_ok=True) + + return workspace_dir, cobol_dir, java_dir + + +def create_sample_cobol_files(cobol_dir: Path) -> list[str]: + """Create sample COBOL files for demonstration. + + In a real scenario, you would clone files from: + https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl + """ + sample_files = { + "CBACT01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBACT01C. + ***************************************************************** + * Program: CBACT01C - Account Display Program + * Purpose: Display account information for a given account number + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-ACCOUNT-ID PIC 9(11). + 01 WS-ACCOUNT-STATUS PIC X(1). + 01 WS-ACCOUNT-BALANCE PIC S9(13)V99. + 01 WS-CUSTOMER-NAME PIC X(50). + 01 WS-ERROR-MSG PIC X(80). + + PROCEDURE DIVISION. + PERFORM 1000-INIT. + PERFORM 2000-PROCESS. + PERFORM 3000-TERMINATE. + STOP RUN. + + 1000-INIT. + INITIALIZE WS-ACCOUNT-ID + INITIALIZE WS-ACCOUNT-STATUS + INITIALIZE WS-ACCOUNT-BALANCE + INITIALIZE WS-CUSTOMER-NAME. + + 2000-PROCESS. + DISPLAY "ENTER ACCOUNT NUMBER: " + ACCEPT WS-ACCOUNT-ID + IF WS-ACCOUNT-ID = ZEROS + MOVE "INVALID ACCOUNT NUMBER" TO WS-ERROR-MSG + DISPLAY WS-ERROR-MSG + ELSE + DISPLAY "ACCOUNT: " WS-ACCOUNT-ID + DISPLAY "STATUS: " WS-ACCOUNT-STATUS + DISPLAY "BALANCE: " WS-ACCOUNT-BALANCE + END-IF. + + 3000-TERMINATE. + DISPLAY "PROGRAM COMPLETE". +""", + "CBCUS01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBCUS01C. + ***************************************************************** + * Program: CBCUS01C - Customer Information Program + * Purpose: Manage customer data operations + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-CUSTOMER-ID PIC 9(9). + 01 WS-FIRST-NAME PIC X(25). + 01 WS-LAST-NAME PIC X(25). + 01 WS-ADDRESS PIC X(100). + 01 WS-PHONE PIC X(15). + 01 WS-EMAIL PIC X(50). + 01 WS-OPERATION PIC X(1). + 88 OP-ADD VALUE 'A'. + 88 OP-UPDATE VALUE 'U'. + 88 OP-DELETE VALUE 'D'. + 88 OP-DISPLAY VALUE 'V'. + + PROCEDURE DIVISION. + PERFORM 1000-MAIN-PROCESS. + STOP RUN. + + 1000-MAIN-PROCESS. + DISPLAY "CUSTOMER MANAGEMENT SYSTEM" + DISPLAY "A-ADD U-UPDATE D-DELETE V-VIEW" + ACCEPT WS-OPERATION + EVALUATE TRUE + WHEN OP-ADD + PERFORM 2000-ADD-CUSTOMER + WHEN OP-UPDATE + PERFORM 3000-UPDATE-CUSTOMER + WHEN OP-DELETE + PERFORM 4000-DELETE-CUSTOMER + WHEN OP-DISPLAY + PERFORM 5000-DISPLAY-CUSTOMER + WHEN OTHER + DISPLAY "INVALID OPERATION" + END-EVALUATE. + + 2000-ADD-CUSTOMER. + DISPLAY "ADDING NEW CUSTOMER" + ACCEPT WS-CUSTOMER-ID + ACCEPT WS-FIRST-NAME + ACCEPT WS-LAST-NAME + DISPLAY "CUSTOMER ADDED: " WS-CUSTOMER-ID. + + 3000-UPDATE-CUSTOMER. + DISPLAY "UPDATING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "CUSTOMER UPDATED: " WS-CUSTOMER-ID. + + 4000-DELETE-CUSTOMER. + DISPLAY "DELETING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "CUSTOMER DELETED: " WS-CUSTOMER-ID. + + 5000-DISPLAY-CUSTOMER. + DISPLAY "DISPLAYING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "ID: " WS-CUSTOMER-ID + DISPLAY "NAME: " WS-FIRST-NAME " " WS-LAST-NAME. +""", + "CBTRN01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBTRN01C. + ***************************************************************** + * Program: CBTRN01C - Transaction Processing Program + * Purpose: Process financial transactions + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-TRANS-ID PIC 9(16). + 01 WS-TRANS-TYPE PIC X(2). + 88 TRANS-CREDIT VALUE 'CR'. + 88 TRANS-DEBIT VALUE 'DB'. + 88 TRANS-TRANSFER VALUE 'TR'. + 01 WS-TRANS-AMOUNT PIC S9(13)V99. + 01 WS-FROM-ACCOUNT PIC 9(11). + 01 WS-TO-ACCOUNT PIC 9(11). + 01 WS-TRANS-DATE PIC 9(8). + 01 WS-TRANS-STATUS PIC X(10). + + PROCEDURE DIVISION. + PERFORM 1000-INITIALIZE. + PERFORM 2000-PROCESS-TRANSACTION. + PERFORM 3000-FINALIZE. + STOP RUN. + + 1000-INITIALIZE. + MOVE ZEROS TO WS-TRANS-ID + MOVE SPACES TO WS-TRANS-TYPE + MOVE ZEROS TO WS-TRANS-AMOUNT + MOVE "PENDING" TO WS-TRANS-STATUS. + + 2000-PROCESS-TRANSACTION. + DISPLAY "ENTER TRANSACTION TYPE (CR/DB/TR): " + ACCEPT WS-TRANS-TYPE + DISPLAY "ENTER AMOUNT: " + ACCEPT WS-TRANS-AMOUNT + EVALUATE TRUE + WHEN TRANS-CREDIT + PERFORM 2100-PROCESS-CREDIT + WHEN TRANS-DEBIT + PERFORM 2200-PROCESS-DEBIT + WHEN TRANS-TRANSFER + PERFORM 2300-PROCESS-TRANSFER + WHEN OTHER + MOVE "INVALID" TO WS-TRANS-STATUS + END-EVALUATE. + + 2100-PROCESS-CREDIT. + DISPLAY "PROCESSING CREDIT" + ACCEPT WS-TO-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "CREDIT APPLIED TO: " WS-TO-ACCOUNT. + + 2200-PROCESS-DEBIT. + DISPLAY "PROCESSING DEBIT" + ACCEPT WS-FROM-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "DEBIT FROM: " WS-FROM-ACCOUNT. + + 2300-PROCESS-TRANSFER. + DISPLAY "PROCESSING TRANSFER" + ACCEPT WS-FROM-ACCOUNT + ACCEPT WS-TO-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "TRANSFER FROM " WS-FROM-ACCOUNT " TO " WS-TO-ACCOUNT. + + 3000-FINALIZE. + DISPLAY "TRANSACTION STATUS: " WS-TRANS-STATUS. +""", + } + + created_files = [] + for filename, content in sample_files.items(): + file_path = cobol_dir / filename + file_path.write_text(content) + created_files.append(filename) + + return created_files + + +def get_refactoring_prompt( + cobol_dir: Path, + java_dir: Path, + cobol_files: list[str], + critique_file: Path | None = None, +) -> str: + """Generate the prompt for the refactoring agent.""" + files_list = "\n".join(f" - {f}" for f in cobol_files) + + base_prompt = f"""Convert the following COBOL files to Java: + +COBOL Source Directory: {cobol_dir} +Java Target Directory: {java_dir} + +Files to convert: +{files_list} + +Requirements: +1. Create a Java class for each COBOL program +2. Preserve the business logic and data structures +3. Use appropriate Java naming conventions (camelCase for methods, PascalCase) +4. Convert COBOL data types to appropriate Java types +5. Implement proper error handling with try-catch blocks +6. Add JavaDoc comments explaining the purpose of each class and method +7. In JavaDoc comments, include traceability to the original COBOL source using + the format: @source : (e.g., @source CBACT01C.cbl:73-77) +8. Create a clean, maintainable object-oriented design +9. Each Java file should be compilable and follow Java best practices + +Read each COBOL file and create the corresponding Java file in the target directory. +""" + + if critique_file and critique_file.exists(): + base_prompt += f""" + +IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. +Please review the critique at: {critique_file} +Address all issues mentioned in the critique to improve the conversion quality. +""" + + return base_prompt + + +def get_critique_prompt( + cobol_dir: Path, + java_dir: Path, + cobol_files: list[str], +) -> str: + """Generate the prompt for the critique agent.""" + files_list = "\n".join(f" - {f}" for f in cobol_files) + + return f"""Evaluate the quality of COBOL to Java refactoring. + +COBOL Source Directory: {cobol_dir} +Java Target Directory: {java_dir} + +Original COBOL files: +{files_list} + +Please evaluate each converted Java file against its original COBOL source. + +For each file, assess: +1. Correctness: Does the Java code preserve the original business logic? (0-25 pts) +2. Code Quality: Is the code clean, readable, following Java conventions? (0-25 pts) +3. Completeness: Are all COBOL features properly converted? (0-25 pts) +4. Best Practices: Does it use proper OOP, error handling, documentation? (0-25 pts) + +Create a critique report in the following EXACT format: + +# COBOL to Java Refactoring Critique Report + +## Summary +[Brief overall assessment] + +## File Evaluations + +### [Original COBOL filename] +- **Java File**: [corresponding Java filename or "NOT FOUND"] +- **Correctness**: [score]/25 - [brief explanation] +- **Code Quality**: [score]/25 - [brief explanation] +- **Completeness**: [score]/25 - [brief explanation] +- **Best Practices**: [score]/25 - [brief explanation] +- **File Score**: [total]/100 +- **Issues to Address**: + - [specific issue 1] + - [specific issue 2] + ... + +[Repeat for each file] + +## Overall Score +- **Average Score**: [calculated average of all file scores] +- **Recommendation**: [PASS if average >= 90, NEEDS_IMPROVEMENT otherwise] + +## Priority Improvements +1. [Most critical improvement needed] +2. [Second priority] +3. [Third priority] + +Save this report to: {java_dir.parent}/critiques/critique_report.md +""" + + +def parse_critique_score(critique_file: Path) -> float: + """Parse the average score from the critique report.""" + if not critique_file.exists(): + return 0.0 + + content = critique_file.read_text() + + # Look for "Average Score: X" pattern + patterns = [ + r"\*\*Average Score\*\*:\s*(\d+(?:\.\d+)?)", + r"Average Score:\s*(\d+(?:\.\d+)?)", + r"average.*?(\d+(?:\.\d+)?)\s*(?:/100|%|$)", + ] + + for pattern in patterns: + match = re.search(pattern, content, re.IGNORECASE) + if match: + return float(match.group(1)) + + return 0.0 + + +def run_iterative_refinement() -> None: + """Run the iterative refinement workflow.""" + # Setup + api_key = os.getenv("LLM_API_KEY") + assert api_key is not None, "LLM_API_KEY environment variable is not set." + model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") + base_url = os.getenv("LLM_BASE_URL") + + llm = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="iterative_refinement", + ) + + workspace_dir, cobol_dir, java_dir = setup_workspace() + critique_dir = workspace_dir / "critiques" + + print(f"Workspace: {workspace_dir}") + print(f"COBOL Directory: {cobol_dir}") + print(f"Java Directory: {java_dir}") + print(f"Critique Directory: {critique_dir}") + print() + + # Create sample COBOL files + cobol_files = create_sample_cobol_files(cobol_dir) + print(f"Created {len(cobol_files)} sample COBOL files:") + for f in cobol_files: + print(f" - {f}") + print() + + critique_file = critique_dir / "critique_report.md" + current_score = 0.0 + iteration = 0 + + while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + iteration += 1 + print("=" * 80) + print(f"ITERATION {iteration}") + print("=" * 80) + + # Phase 1: Refactoring + print("\n--- Phase 1: Refactoring Agent ---") + refactoring_agent = get_default_agent(llm=llm, cli_mode=True) + refactoring_conversation = Conversation( + agent=refactoring_agent, + workspace=str(workspace_dir), + ) + + previous_critique = critique_file if iteration > 1 else None + refactoring_prompt = get_refactoring_prompt( + cobol_dir, java_dir, cobol_files, previous_critique + ) + + refactoring_conversation.send_message(refactoring_prompt) + refactoring_conversation.run() + print("Refactoring phase complete.") + + # Phase 2: Critique + print("\n--- Phase 2: Critique Agent ---") + critique_agent = get_default_agent(llm=llm, cli_mode=True) + critique_conversation = Conversation( + agent=critique_agent, + workspace=str(workspace_dir), + ) + + critique_prompt = get_critique_prompt(cobol_dir, java_dir, cobol_files) + critique_conversation.send_message(critique_prompt) + critique_conversation.run() + print("Critique phase complete.") + + # Parse the score + current_score = parse_critique_score(critique_file) + print(f"\nCurrent Score: {current_score:.1f}%") + + if current_score >= QUALITY_THRESHOLD: + print(f"\n✓ Quality threshold ({QUALITY_THRESHOLD}%) met!") + else: + print( + f"\n✗ Score below threshold ({QUALITY_THRESHOLD}%). " + "Continuing refinement..." + ) + + # Final summary + print("\n" + "=" * 80) + print("ITERATIVE REFINEMENT COMPLETE") + print("=" * 80) + print(f"Total iterations: {iteration}") + print(f"Final score: {current_score:.1f}%") + print(f"Workspace: {workspace_dir}") + + # List created Java files + print("\nCreated Java files:") + for java_file in java_dir.glob("*.java"): + print(f" - {java_file.name}") + + # Show critique file location + if critique_file.exists(): + print(f"\nFinal critique report: {critique_file}") + + # Report cost + cost = llm.metrics.accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") + + +if __name__ == "__main__": + run_iterative_refinement() +``` + + + +## Next Steps + +- [Agent Delegation](/sdk/guides/agent-delegation) - Parallel task execution with sub-agents +- [Custom Tools](/sdk/guides/custom-tools) - Create specialized tools for your workflow + +### Exception Handling +Source: https://docs.openhands.dev/sdk/guides/llm-error-handling.md + +The SDK normalizes common provider errors into typed, provider‑agnostic exceptions so your application can handle them consistently across OpenAI, Anthropic, Groq, Google, and others. + +This guide explains when these errors occur and shows recommended handling patterns for both direct LLM usage and higher‑level agent/conversation flows. + +## Why typed exceptions? + +LLM providers format errors differently (status codes, messages, exception classes). The SDK maps those into stable types so client apps don’t depend on provider‑specific details. Typical benefits: + +- One code path to handle auth, rate limits, timeouts, service issues, and bad requests +- Clear behavior when conversation history exceeds the context window +- Backward compatibility when you switch providers or SDK versions + +## Quick start: Using agents and conversations + +Agent-driven conversations are the common entry point. Exceptions from the underlying LLM calls bubble up from `conversation.run()` and `conversation.send_message(...)` when a condenser is not configured. + +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import Agent, Conversation, LLM +from openhands.sdk.llm.exceptions import ( + LLMError, + LLMAuthenticationError, + LLMRateLimitError, + LLMTimeoutError, + LLMServiceUnavailableError, + LLMBadRequestError, + LLMContextWindowExceedError, +) + +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) +agent = Agent(llm=llm, tools=[]) +conversation = Conversation( + agent=agent, + persistence_dir="./.conversations", + workspace=".", +) + +try: + conversation.send_message( + "Continue the long analysis we started earlier…" + ) + conversation.run() + +except LLMContextWindowExceedError: + # Conversation is longer than the model’s context window + # Options: + # 1) Enable a condenser (recommended for long sessions) + # 2) Shorten inputs or reset conversation + print("Hit the context limit. Consider enabling a condenser.") + +except LLMAuthenticationError: + print( + "Invalid or missing API credentials." + "Check your API key or auth setup." + ) + +except LLMRateLimitError: + print("Rate limit exceeded. Back off and retry later.") + +except LLMTimeoutError: + print("Request timed out. Consider increasing timeout or retrying.") + +except LLMServiceUnavailableError: + print("Service unavailable or connectivity issue. Retry with backoff.") + +except LLMBadRequestError: + print("Bad request to provider. Validate inputs and arguments.") + +except LLMError as e: + # Fallback for other SDK LLM errors (parsing/validation, etc.) + print(f"Unhandled LLM error: {e}") +``` + + + +### Avoiding context‑window errors with a condenser + +If a condenser is configured, the SDK emits a condensation request event instead of raising `LLMContextWindowExceedError`. The agent will summarize older history and continue. + +```python icon="python" focus={5-6, 9-14} wrap +from openhands.sdk.context.condenser import LLMSummarizingCondenser + +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), + max_size=10, + keep_first=2, +) + +agent = Agent(llm=llm, tools=[], condenser=condenser) +conversation = Conversation( + agent=agent, + persistence_dir="./.conversations", + workspace=".", +) +``` + + + See the dedicated guide: [Context Condenser](/sdk/guides/context-condenser). + + +## Handling errors with direct LLM calls + +The same exceptions are raised from both `LLM.completion()` and `LLM.responses()` paths, so you can share handlers. + +### Example: Using `.completion()` + +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import LLM +from openhands.sdk.llm import Message, TextContent +from openhands.sdk.llm.exceptions import ( + LLMError, + LLMAuthenticationError, + LLMRateLimitError, + LLMTimeoutError, + LLMServiceUnavailableError, + LLMBadRequestError, + LLMContextWindowExceedError, +) + +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) + +try: + response = llm.completion([ + Message.user([TextContent(text="Summarize our design doc")]) + ]) + print(response.message) + +except LLMContextWindowExceedError: + print("Context window exceeded. Consider enabling a condenser.") +except LLMAuthenticationError: + print("Invalid or missing API credentials.") +except LLMRateLimitError: + print("Rate limit exceeded. Back off and retry later.") +except LLMTimeoutError: + print("Request timed out. Consider increasing timeout or retrying.") +except LLMServiceUnavailableError: + print("Service unavailable or connectivity issue. Retry with backoff.") +except LLMBadRequestError: + print("Bad request to provider. Validate inputs and arguments.") +except LLMError as e: + print(f"Unhandled LLM error: {e}") +``` + +### Example: Using `.responses()` + +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import LLM +from openhands.sdk.llm import Message, TextContent +from openhands.sdk.llm.exceptions import LLMError, LLMContextWindowExceedError + +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) + +try: + resp = llm.responses([ + Message.user( + [TextContent(text="Write a one-line haiku about code.")] + ) + ]) + print(resp.message) +except LLMContextWindowExceedError: + print("Context window exceeded. Consider enabling a condenser.") +except LLMError as e: + print(f"LLM error: {e}") +``` + +## Exception reference + +All exceptions live under `openhands.sdk.llm.exceptions` unless noted. + +| Category | Error | Description | +|--------|------|-------------| +| **Provider / transport (provider-agnostic)** | `LLMContextWindowExceedError` | Conversation exceeds the model’s context window. Without a condenser, thrown for both Chat and Responses paths. | +| | `LLMAuthenticationError` | Invalid or missing credentials (401/403 patterns). | +| | `LLMRateLimitError` | Provider rate limit exceeded. | +| | `LLMTimeoutError` | SDK or lower-level timeout while waiting for the provider. | +| | `LLMServiceUnavailableError` | Temporary connectivity or service outage (e.g., 5xx responses, connection issues). | +| | `LLMBadRequestError` | Client-side request issues (invalid parameters, malformed input). | +| **Response parsing / validation** | `LLMMalformedActionError` | Model returned a malformed action. | +| | `LLMNoActionError` | Model did not return an action when one was expected. | +| | `LLMResponseError` | Could not extract an action from the response. | +| | `FunctionCallConversionError` | Failed converting tool/function call payloads. | +| | `FunctionCallValidationError` | Tool/function call arguments failed validation. | +| | `FunctionCallNotExistsError` | Model referenced an unknown tool or function. | +| | `LLMNoResponseError` | Provider returned an empty or invalid response (rare; observed with some Gemini models). | +| **Cancellation** | `UserCancelledError` | A user explicitly aborted the operation. | +| | `OperationCancelled` | A running operation was cancelled programmatically. | + + + All of the above (except the explicit cancellation types) inherit from `LLMError`, so you can implement a catch‑all + for unexpected SDK LLM errors while still keeping fine‑grained handlers for the most common cases. + + +### LLM Fallback Strategy +Source: https://docs.openhands.dev/sdk/guides/llm-fallback.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +`FallbackStrategy` gives your agent automatic resilience: when the primary LLM fails with a transient error (rate limit, timeout, connection issue), the SDK tries alternate LLMs in order. Fallback is **per-call** — each new request always starts with the primary model. + +## Basic Usage + +Attach a `FallbackStrategy` to your primary `LLM`. The fallback LLMs are referenced by name from an [LLM Profile Store](/sdk/guides/llm-profile-store): + +```python icon="python" wrap focus={16, 17, 21, 22, 23} +from pydantic import SecretStr +from openhands.sdk import LLM, LLMProfileStore +from openhands.sdk.llm import FallbackStrategy + +# Menage persisted LLM profiles +# default store directory: .openhands/profiles +store = LLMProfileStore() + +fallback_llm = LLM( + usage_id="fallback-1", + model="openai/gpt-4o", + api_key=SecretStr("your-openai-key"), +) +store.save("fallback-1", fallback_llm, include_secrets=True) + +# Configure an LLM with a fallback strategy +primary_llm = LLM( + usage_id="agent-primary", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("your-api-key"), + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1"], + ), +) +``` + +## How It Works + +1. The primary LLM handles the request as normal +2. If the call fails with a **transient error**, the `FallbackStrategy` kicks in and tries each fallback LLM in order +3. The first successful fallback response is returned to the caller +4. If all fallbacks fail, the original primary error is raised +5. Token usage and cost from fallback calls are **merged into the primary LLM's metrics**, so you get a unified view of total spend by model + + +Only transient errors trigger fallback. +Non-transient errors (e.g., authentication failures, bad requests) are raised immediately without trying fallbacks. +For a complete list of supported transient errors see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/978dd7d1e3268331b7f8af514e7a7930f98eb8af/openhands-sdk/openhands/sdk/llm/fallback_strategy.py#L29) + + +## Multiple Fallback Levels + +Chain as many fallback LLMs as you need. They are tried in list order: + +```python icon="python" wrap focus={5-7} +llm = LLM( + usage_id="agent-primary", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + ), +) +``` + +If the primary fails, `fallback-1` is tried. If that also fails, `fallback-2` is tried. If all fail, the primary error is raised. + +## Custom Profile Store Directory + +By default, fallback profiles are loaded from `.openhands/profiles`. You can point to a different directory: + +```python icon="python" wrap focus={3} +FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + profile_store_dir="/path/to/my/profiles", +) +``` + +## Metrics + +Fallback costs are automatically merged into the primary LLM's metrics. After a conversation, you can inspect exactly which models were used: + +```python icon="python" wrap +# After running a conversation +metrics = llm.metrics +print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") + +for usage in metrics.token_usages: + print(f" model={usage.model} prompt={usage.prompt_tokens} completion={usage.completion_tokens}") +``` + +Individual `token_usage` records carry the fallback model name, so you can distinguish which LLM produced each usage record. + +## Use Cases + +- **Rate limit handling** — When one provider throttles you, seamlessly switch to another +- **High availability** — Keep your agent running during provider outages +- **Cost optimization** — Try a cheaper model first and fall back to a more capable one on failure +- **Cross-provider redundancy** — Spread risk across Anthropic, OpenAI, Google, etc. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/39_llm_fallback.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/39_llm_fallback.py) + + +```python icon="python" expandable examples/01_standalone_sdk/39_llm_fallback.py +"""Example: Using FallbackStrategy for LLM resilience. + +When the primary LLM fails with a transient error (rate limit, timeout, etc.), +FallbackStrategy automatically tries alternate LLMs in order. Fallback is +per-call: each new request starts with the primary model. Token usage and +cost from fallback calls are merged into the primary LLM's metrics. + +This example: + 1. Saves two fallback LLM profiles to a temporary store. + 2. Configures a primary LLM with a FallbackStrategy pointing at those profiles. + 3. Runs a conversation — if the primary model is unavailable, the agent + transparently falls back to the next available model. +""" + +import os +import tempfile + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation, LLMProfileStore, Tool +from openhands.sdk.llm import FallbackStrategy +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Read configuration from environment +api_key = os.getenv("LLM_API_KEY", None) +assert api_key is not None, "LLM_API_KEY environment variable is not set." +base_url = os.getenv("LLM_BASE_URL") +primary_model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") + +# Use a temporary directory so this example doesn't pollute your home folder. +# In real usage you can omit base_dir to use the default (~/.openhands/profiles). +profile_store_dir = tempfile.mkdtemp() +store = LLMProfileStore(base_dir=profile_store_dir) + +fallback_1 = LLM( + usage_id="fallback-1", + model=os.getenv("LLM_FALLBACK_MODEL_1", "openai/gpt-4o"), + api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_1", api_key)), + base_url=os.getenv("LLM_FALLBACK_BASE_URL_1", base_url), +) +store.save("fallback-1", fallback_1, include_secrets=True) + +fallback_2 = LLM( + usage_id="fallback-2", + model=os.getenv("LLM_FALLBACK_MODEL_2", "openai/gpt-4o-mini"), + api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_2", api_key)), + base_url=os.getenv("LLM_FALLBACK_BASE_URL_2", base_url), +) +store.save("fallback-2", fallback_2, include_secrets=True) + +print(f"Saved fallback profiles: {store.list()}") + + +# Configure the primary LLM with a FallbackStrategy +primary_llm = LLM( + usage_id="agent-primary", + model=primary_model, + api_key=SecretStr(api_key), + base_url=base_url, + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + profile_store_dir=profile_store_dir, + ), +) + + +# Run a conversation +agent = Agent( + llm=primary_llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + ], +) + +conversation = Conversation(agent=agent, workspace=os.getcwd()) +conversation.send_message("Write a haiku about resilience into HAIKU.txt.") +conversation.run() + + +# Inspect metrics (includes any fallback usage) +metrics = primary_llm.metrics +print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") +print(f"Token usage records: {len(metrics.token_usages)}") +for usage in metrics.token_usages: + print( + f" model={usage.model}" + f" prompt={usage.prompt_tokens}" + f" completion={usage.completion_tokens}" + ) + +print(f"EXAMPLE_COST: {metrics.accumulated_cost}") +``` + + + +## Next Steps + +- **[LLM Profile Store](/sdk/guides/llm-profile-store)** — Save and load LLM configurations as reusable profiles +- **[Model Routing](/sdk/guides/llm-routing)** — Route requests based on content (e.g., multimodal vs text-only) +- **[Exception Handling](/sdk/guides/llm-error-handling)** — Handle LLM errors in your application +- **[LLM Metrics](/sdk/guides/metrics)** — Track token usage and costs across models + +### Image Input +Source: https://docs.openhands.dev/sdk/guides/llm-image-input.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +### Sending Images + +The LLM you use must support image inputs (`llm.vision_is_active()` need to be `True`). + +Pass images along with text in the message content: + +```python focus={14} icon="python" wrap +from openhands.sdk import ImageContent + +IMAGE_URL = "https://github.com/OpenHands/OpenHands/raw/main/docs/static/img/logo.png" +conversation.send_message( + Message( + role="user", + content=[ + TextContent( + text=( + "Study this image and describe the key elements you see. " + "Summarize them in a short paragraph and suggest a catchy caption." + ) + ), + ImageContent(image_urls=[IMAGE_URL]), + ], + ) +) +``` + +Works with multimodal LLMs like `GPT-4 Vision` and `Claude` with vision capabilities. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/17_image_input.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) + + +You can send images to multimodal LLMs for vision-based tasks like screenshot analysis, image processing, and visual QA: + +```python icon="python" expandable examples/01_standalone_sdk/17_image_input.py +"""OpenHands Agent SDK — Image Input Example. + +This script mirrors the basic setup from ``examples/01_hello_world.py`` but adds +vision support by sending an image to the agent alongside text instructions. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool.spec import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM (vision-capable model) +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="vision-llm", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +assert llm.vision_is_active(), "The selected LLM model does not support vision input." + +cwd = os.getcwd() + +agent = Agent( + llm=llm, + tools=[ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], +) + +llm_messages = [] # collect raw LLM messages for inspection + + +def conversation_callback(event: Event) -> None: + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +IMAGE_URL = "https://github.com/OpenHands/docs/raw/main/openhands/static/img/logo.png" + +conversation.send_message( + Message( + role="user", + content=[ + TextContent( + text=( + "Study this image and describe the key elements you see. " + "Summarize them in a short paragraph and suggest a catchy caption." + ) + ), + ImageContent(image_urls=[IMAGE_URL]), + ], + ) +) +conversation.run() + +conversation.send_message( + "Great! Please save your description and caption into image_report.md." +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[Hello World](/sdk/guides/hello-world)** - Learn basic conversation patterns +- **[Async Operations](/sdk/guides/convo-async)** - Process multiple images concurrently + +### LLM Profile Store +Source: https://docs.openhands.dev/sdk/guides/llm-profile-store.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The `LLMProfileStore` class provides a centralized mechanism for managing `LLM` configurations. +Define a profile once, reuse it everywhere — across scripts, sessions, and even machines. + +## Benefits +- **Persistence:** Saves model parameters (API keys, temperature, max tokens, ...) to a stable disk format. +- **Reusability:** Import a defined profile into any script or session with a single identifier. +- **Portability:** Simplifies the synchronization of model configurations across different machines or deployment environments. + +## How It Works + + + + ### Create a Store + + The store manages a directory of JSON profile files. By default it uses `~/.openhands/profiles`, + but you can point it anywhere. + + ```python icon="python" focus={3, 4, 6, 7} + from openhands.sdk import LLMProfileStore + + # Default location: ~/.openhands/profiles + store = LLMProfileStore() + + # Or bring your own directory + store = LLMProfileStore(base_dir="./my-profiles") + ``` + + + ### Save a Profile + + Got an LLM configured just right? Save it for later. + + ```python icon="python" focus={11, 12} + from pydantic import SecretStr + from openhands.sdk import LLM, LLMProfileStore + + fast_llm = LLM( + usage_id="fast", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("sk-..."), + temperature=0.0, + ) + + store = LLMProfileStore() + store.save("fast", fast_llm) + ``` + + + API keys are **excluded** by default for security. Pass `include_secrets=True` to the save method if you wish to + persist them; otherwise, they will be read from the environment at load time. + + + + ### Load a Profile + + Next time you need that LLM, just load it: + + ```python icon="python" + # Same model, ready to go. + llm = store.load("fast") + ``` + + + ### List and Clean Up + + See what you've got, delete what you don't need: + + ```python icon="python" focus={1, 3, 4} + print(store.list()) # ['fast.json', 'creative.json'] + + store.delete("creative") + print(store.list()) # ['fast.json'] + ``` + + + +## Good to Know + +Profile names must be simple filenames (no slashes, no dots at the start). + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/37_llm_profile_store.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/37_llm_profile_store.py) + + +```python icon="python" expandable examples/01_standalone_sdk/37_llm_profile_store.py +"""Example: Using LLMProfileStore to save and reuse LLM configurations. + +LLMProfileStore persists LLM configurations as JSON files, so you can define +a profile once and reload it across sessions without repeating setup code. +""" + +import os +import tempfile + +from pydantic import SecretStr + +from openhands.sdk import LLM, LLMProfileStore + + +# Use a temporary directory so this example doesn't pollute your home folder. +# In real usage you can omit base_dir to use the default (~/.openhands/profiles). +store = LLMProfileStore(base_dir=tempfile.mkdtemp()) + + +# 1. Create two LLM profiles with different usage + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +base_url = os.getenv("LLM_BASE_URL") +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") + +fast_llm = LLM( + usage_id="fast", + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + temperature=0.0, +) + +creative_llm = LLM( + usage_id="creative", + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + temperature=0.9, +) + +# 2. Save profiles + +# Note that secrets are excluded by default for safety. +store.save("fast", fast_llm) +store.save("creative", creative_llm) + +# To persist the API key as well, pass `include_secrets=True`: +# store.save("fast", fast_llm, include_secrets=True) + +# 3. List available persisted profiles + +print(f"Stored profiles: {store.list()}") + +# 4. Load a profile + +loaded = store.load("fast") +assert isinstance(loaded, LLM) +print( + "Loaded profile. " + f"usage:{loaded.usage_id}, " + f"model: {loaded.model}, " + f"temperature: {loaded.temperature}." +) + +# 5. Delete a profile + +store.delete("creative") +print(f"After deletion: {store.list()}") + +print("EXAMPLE_COST: 0") +``` + + + +## Next Steps + +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLMs in memory at runtime +- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models +- **[Exception Handling](/sdk/guides/llm-error-handling)** - Handle LLM errors gracefully + +### Reasoning +Source: https://docs.openhands.dev/sdk/guides/llm-reasoning.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +View your agent's internal reasoning process for debugging, transparency, and understanding decision-making. + +This guide demonstrates two provider-specific approaches: +1. **Anthropic Extended Thinking** - Claude's thinking blocks for complex reasoning +2. **OpenAI Reasoning via Responses API** - GPT's reasoning effort parameter + +## Anthropic Extended Thinking + +> A ready-to-run example is available [here](#ready-to-run-example-antrophic)! + +Anthropic's Claude models support extended thinking, which allows you to access the model's internal reasoning process +through thinking blocks. This is useful for understanding how Claude approaches complex problems step-by-step. + +### How It Works + +The key to accessing thinking blocks is to register a callback that checks for `thinking_blocks` in LLM messages: + +```python focus={6-11} icon="python" wrap +def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"🧠 Found {len(message.thinking_blocks)} thinking blocks") + for block in message.thinking_blocks: + if isinstance(block, RedactedThinkingBlock): + print(f"Redacted: {block.data}") + elif isinstance(block, ThinkingBlock): + print(f"Thinking: {block.thinking}") + +conversation = Conversation(agent=agent, callbacks=[show_thinking]) +``` + +### Understanding Thinking Blocks + +Claude uses thinking blocks to reason through complex problems step-by-step. There are two types: + +- **`ThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#how-extended-thinking-works)): Contains the full reasoning text from Claude's internal thought process +- **`RedactedThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#thinking-redaction)): Contains redacted or summarized thinking data + +By registering a callback with your conversation, you can intercept and display these thinking blocks in real-time, +giving you insight into how Claude is approaching the problem. + +### Ready-to-run Example Antrophic + + +This example is available on GitHub: [examples/01_standalone_sdk/22_anthropic_thinking.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/22_anthropic_thinking.py) + + +```python icon="python" expandable examples/01_standalone_sdk/22_anthropic_thinking.py +"""Example demonstrating Anthropic's extended thinking feature with thinking blocks.""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + RedactedThinkingBlock, + ThinkingBlock, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM for Anthropic Claude with extended thinking +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Setup agent with bash tool +agent = Agent(llm=llm, tools=[Tool(name=TerminalTool.name)]) + + +# Callback to display thinking blocks +def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"\n🧠 Found {len(message.thinking_blocks)} thinking blocks") + for i, block in enumerate(message.thinking_blocks): + if isinstance(block, RedactedThinkingBlock): + print(f" Block {i + 1}: {block.data}") + elif isinstance(block, ThinkingBlock): + print(f" Block {i + 1}: {block.thinking}") + + +conversation = Conversation( + agent=agent, callbacks=[show_thinking], workspace=os.getcwd() +) + +conversation.send_message( + "Calculate compound interest for $10,000 at 5% annually, " + "compounded quarterly for 3 years. Show your work.", +) +conversation.run() + +conversation.send_message( + "Now, write that number to RESULTs.txt.", +) +conversation.run() +print("✅ Done!") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## OpenAI Reasoning via Responses API + +> A ready-to-run example is available [here](#ready-to-run-example-openai)! + +OpenAI's latest models (e.g., `GPT-5`, `GPT-5-Codex`) support a [Responses API](https://platform.openai.com/docs/api-reference/responses) +that provides access to the model's reasoning process. +By setting the `reasoning_effort` parameter, you can control how much reasoning the model performs and access those reasoning traces. + +### How It Works + +Configure the LLM with the `reasoning_effort` parameter to enable reasoning: + +```python focus={5} icon="python" wrap +llm = LLM( + model="openhands/gpt-5-codex", + api_key=SecretStr(api_key), + base_url=base_url, + # Enable reasoning with effort level + reasoning_effort="high", +) +``` + +The `reasoning_effort` parameter can be set to `"none"`, `"low"`, `"medium"`, or `"high"` to control the amount of +reasoning performed by the model. + +Then capture reasoning traces in your callback: + +```python focus={3-4} icon="python" wrap +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + msg = event.to_llm_message() + llm_messages.append(msg) +``` + +### Understanding Reasoning Traces + +The OpenAI Responses API provides reasoning traces that show how the model approached the problem. +These traces are available in the LLM messages and can be inspected to understand the model's decision-making process. +Unlike Anthropic's thinking blocks, OpenAI's reasoning is more tightly integrated with the response generation process. + +### Ready-to-run Example OpenAI + + +This example is available on GitHub: [examples/01_standalone_sdk/23_responses_reasoning.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/23_responses_reasoning.py) + + +```python icon="python" expandable examples/01_standalone_sdk/23_responses_reasoning.py +""" +Example: Responses API path via LiteLLM in a Real Agent Conversation + +- Runs a real Agent/Conversation to verify /responses path works +- Demonstrates rendering of Responses reasoning within normal conversation events +""" + +from __future__ import annotations + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.llm import LLM +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + +api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") +assert api_key, "Set LLM_API_KEY or OPENAI_API_KEY in your environment." + +model = "openhands/gpt-5-mini-2025-08-07" # Use a model that supports Responses API +base_url = os.getenv("LLM_BASE_URL") + +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + # Responses-path options + reasoning_effort="high", + # Logging / behavior tweaks + log_completions=False, + usage_id="agent", +) + +print("\n=== Agent Conversation using /responses path ===") +agent = get_default_agent( + llm=llm, + cli_mode=True, # disable browser tools for env simplicity +) + +llm_messages = [] # collect raw LLM-convertible messages for inspection + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=os.getcwd(), +) + +# Keep the tasks short for demo purposes +conversation.send_message("Read the repo and write one fact into FACTS.txt.") +conversation.run() + +conversation.send_message("Now delete FACTS.txt.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + ms = str(message) + print(f"Message {i}: {ms[:200]}{'...' if len(ms) > 200 else ''}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Use Cases + +**Debugging**: Understand why the agent made specific decisions or took certain actions. + +**Transparency**: Show users how the AI arrived at its conclusions. + +**Quality Assurance**: Identify flawed reasoning patterns or logic errors. + +**Learning**: Study how models approach complex problems. + +## Next Steps + +- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display reasoning in real-time +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and performance +- **[Custom Tools](/sdk/guides/custom-tools)** - Add specialized capabilities + +### LLM Registry +Source: https://docs.openhands.dev/sdk/guides/llm-registry.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +Use the LLM registry to manage multiple LLM providers and dynamically switch between models. + +## Using the Registry + +You can add LLMs to the registry using the `.add` method and retrieve them later using the `.get()` method. + +```python icon="python" focus={9,10,13} +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# define the registry and add an LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) +... +# retrieve the LLM by its usage ID +llm = llm_registry.get("agent") +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) + + +```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + LLMRegistry, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Create LLM registry and add the LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) + +# Get LLM from registry +llm = llm_registry.get("agent") + +# Tools +cwd = os.getcwd() +tools = [Tool(name=TerminalTool.name)] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message("Please echo 'Hello!'") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +print("=" * 100) +print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") + +# Demonstrate getting the same LLM instance from registry +same_llm = llm_registry.get("agent") +print(f"Same LLM instance: {llm is same_llm}") + +# Demonstrate requesting a completion directly from an LLM +resp = llm.completion( + messages=[ + Message(role="user", content=[TextContent(text="Say hello in one word.")]) + ] +) +# Access the response content via OpenHands LLMResponse +msg = resp.message +texts = [c.text for c in msg.content if isinstance(c, TextContent)] +print(f"Direct completion response: {texts[0] if texts else str(msg)}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## Next Steps + +- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs + +### Model Routing +Source: https://docs.openhands.dev/sdk/guides/llm-routing.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +This feature is under active development and more default routers will be available in future releases. + +> A ready-to-run example is available [here](#ready-to-run-example)! + +### Using the built-in MultimodalRouter + +Define the built-in rule-based `MultimodalRouter` that will route text-only requests to a secondary LLM and multimodal requests (with images) to the primary, multimodal-capable LLM: + +```python icon="python" wrap focus={13-16} +primary_llm = LLM( + usage_id="agent-primary", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +secondary_llm = LLM( + usage_id="agent-secondary", + model="litellm_proxy/mistral/devstral-small-2507", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) +multimodal_router = MultimodalRouter( + usage_id="multimodal-router", + llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, +) +``` + +You may define your own router by extending the `Router` class. See the [base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/router/base.py) for details. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/19_llm_routing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/19_llm_routing.py) + + +Automatically route requests to different LLMs based on task characteristics to optimize cost and performance: + +```python icon="python" expandable examples/01_standalone_sdk/19_llm_routing.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Message, + TextContent, + get_logger, +) +from openhands.sdk.llm.router import MultimodalRouter +from openhands.tools.preset.default import get_default_tools + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +primary_llm = LLM( + usage_id="agent-primary", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +secondary_llm = LLM( + usage_id="agent-secondary", + model="openhands/devstral-small-2507", + base_url=base_url, + api_key=SecretStr(api_key), +) +multimodal_router = MultimodalRouter( + usage_id="multimodal-router", + llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, +) + +# Tools +tools = get_default_tools() # Use our default openhands experience + +# Agent +agent = Agent(llm=multimodal_router, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=os.getcwd() +) + +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text=("Hi there, who trained you?"))], + ) +) +conversation.run() + +conversation.send_message( + message=Message( + role="user", + content=[ + ImageContent( + image_urls=["http://images.cocodataset.org/val2017/000000039769.jpg"] + ), + TextContent(text=("What do you see in the image above?")), + ], + ) +) +conversation.run() + +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text=("Who trained you as an LLM?"))], + ) +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## Next Steps + +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs + +### LLM Streaming +Source: https://docs.openhands.dev/sdk/guides/llm-streaming.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + +This is currently only supported for the chat completion endpoint. + + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +Enable real-time display of LLM responses as they're generated, token by token. This guide demonstrates how to use +streaming callbacks to process and display tokens as they arrive from the language model. + + +## How It Works + +Streaming allows you to display LLM responses progressively as the model generates them, rather than waiting for the +complete response. This creates a more responsive user experience, especially for long-form content generation. + + + + ### Enable Streaming on LLM + Configure the LLM with streaming enabled: + + ```python focus={6} icon="python" wrap + llm = LLM( + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="stream-demo", + stream=True, # Enable streaming + ) + ``` + + + ### Define Token Callback + Create a callback function that processes streaming chunks as they arrive: + + ```python icon="python" wrap + def on_token(chunk: ModelResponseStream) -> None: + """Process each streaming chunk as it arrives.""" + choices = chunk.choices + for choice in choices: + delta = choice.delta + if delta is not None: + content = getattr(delta, "content", None) + if isinstance(content, str): + sys.stdout.write(content) + sys.stdout.flush() + ``` + + The callback receives a `ModelResponseStream` object containing: + - **`choices`**: List of response choices from the model + - **`delta`**: Incremental content changes for each choice + - **`content`**: The actual text tokens being streamed + + + ### Register Callback with Conversation + + Pass your token callback to the conversation: + + ```python focus={3} icon="python" wrap + conversation = Conversation( + agent=agent, + token_callbacks=[on_token], # Register streaming callback + workspace=os.getcwd(), + ) + ``` + + The `token_callbacks` parameter accepts a list of callbacks, allowing you to register multiple handlers + if needed (e.g., one for display, another for logging). + + + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/29_llm_streaming.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/29_llm_streaming.py) + + +```python icon="python" expandable examples/01_standalone_sdk/29_llm_streaming.py +import os +import sys +from typing import Literal + +from pydantic import SecretStr + +from openhands.sdk import ( + Conversation, + get_logger, +) +from openhands.sdk.llm import LLM +from openhands.sdk.llm.streaming import ModelResponseStream +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") +if not api_key: + raise RuntimeError("Set LLM_API_KEY or OPENAI_API_KEY in your environment.") + +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="stream-demo", + stream=True, +) + +agent = get_default_agent(llm=llm, cli_mode=True) + + +# Define streaming states +StreamingState = Literal["thinking", "content", "tool_name", "tool_args"] +# Track state across on_token calls for boundary detection +_current_state: StreamingState | None = None + + +def on_token(chunk: ModelResponseStream) -> None: + """ + Handle all types of streaming tokens including content, + tool calls, and thinking blocks with dynamic boundary detection. + """ + global _current_state + + choices = chunk.choices + for choice in choices: + delta = choice.delta + if delta is not None: + # Handle thinking blocks (reasoning content) + reasoning_content = getattr(delta, "reasoning_content", None) + if isinstance(reasoning_content, str) and reasoning_content: + if _current_state != "thinking": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("THINKING: ") + _current_state = "thinking" + sys.stdout.write(reasoning_content) + sys.stdout.flush() + + # Handle regular content + content = getattr(delta, "content", None) + if isinstance(content, str) and content: + if _current_state != "content": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("CONTENT: ") + _current_state = "content" + sys.stdout.write(content) + sys.stdout.flush() + + # Handle tool calls + tool_calls = getattr(delta, "tool_calls", None) + if tool_calls: + for tool_call in tool_calls: + tool_name = ( + tool_call.function.name if tool_call.function.name else "" + ) + tool_args = ( + tool_call.function.arguments + if tool_call.function.arguments + else "" + ) + if tool_name: + if _current_state != "tool_name": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("TOOL NAME: ") + _current_state = "tool_name" + sys.stdout.write(tool_name) + sys.stdout.flush() + if tool_args: + if _current_state != "tool_args": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("TOOL ARGS: ") + _current_state = "tool_args" + sys.stdout.write(tool_args) + sys.stdout.flush() + + +conversation = Conversation( + agent=agent, + workspace=os.getcwd(), + token_callbacks=[on_token], +) + +story_prompt = ( + "Tell me a long story about LLM streaming, write it a file, " + "make sure it has multiple paragraphs. " +) +conversation.send_message(story_prompt) +print("Token Streaming:") +print("-" * 100 + "\n") +conversation.run() + +cleanup_prompt = ( + "Thank you. Please delete the streaming story file now that I've read it, " + "then confirm the deletion." +) +conversation.send_message(cleanup_prompt) +print("Token Streaming:") +print("-" * 100 + "\n") +conversation.run() + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[LLM Error Handling](/sdk/guides/llm-error-handling)** - Handle streaming errors gracefully +- **[Custom Visualizer](/sdk/guides/convo-custom-visualizer)** - Build custom UI for streaming +- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display streams in terminal UI + +### LLM Subscriptions +Source: https://docs.openhands.dev/sdk/guides/llm-subscriptions.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + +OpenAI subscription is the first provider we support. More subscription providers will be added in future releases. + + +> A ready-to-run example is available [here](#ready-to-run-example)! + +Use your existing ChatGPT Plus or Pro subscription to access OpenAI's Codex models without consuming API credits. The SDK handles OAuth authentication, credential caching, and automatic token refresh. + +## How It Works + + + + ### Call subscription_login() + + The `LLM.subscription_login()` class method handles the entire authentication flow: + + ```python icon="python" + from openhands.sdk import LLM + + llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") + ``` + + On first run, this opens your browser for OAuth authentication with OpenAI. After successful login, credentials are cached locally in `~/.openhands/auth/` for future use. + + + ### Use the LLM + + Once authenticated, use the LLM with your agent as usual. The SDK automatically refreshes tokens when they expire. + + + +## Supported Models + +The following models are available via ChatGPT subscription: + +| Model | Description | +|-------|-------------| +| `gpt-5.2-codex` | Latest Codex model (default) | +| `gpt-5.2` | GPT-5.2 base model | +| `gpt-5.1-codex-max` | High-capacity Codex model | +| `gpt-5.1-codex-mini` | Lightweight Codex model | + +## Configuration Options + +### Force Fresh Login + +If your cached credentials become stale or you want to switch accounts: + +```python icon="python" +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", + force_login=True, # Always perform fresh OAuth login +) +``` + +### Disable Browser Auto-Open + +For headless environments or when you prefer to manually open the URL: + +```python icon="python" +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", + open_browser=False, # Prints URL to console instead +) +``` + +### Check Subscription Mode + +Verify that the LLM is using subscription-based authentication: + +```python icon="python" +llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") +print(f"Using subscription: {llm.is_subscription}") # True +``` + +## Credential Storage + +Credentials are stored securely in `~/.openhands/auth/`. To clear cached credentials and force a fresh login, delete the files in this directory. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/35_subscription_login.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/35_subscription_login.py) + + +```python icon="python" expandable examples/01_standalone_sdk/35_subscription_login.py +"""Example: Using ChatGPT subscription for Codex models. + +This example demonstrates how to use your ChatGPT Plus/Pro subscription +to access OpenAI's Codex models without consuming API credits. + +The subscription_login() method handles: +- OAuth PKCE authentication flow +- Credential caching (~/.openhands/auth/) +- Automatic token refresh + +Supported models: +- gpt-5.2-codex +- gpt-5.2 +- gpt-5.1-codex-max +- gpt-5.1-codex-mini + +Requirements: +- Active ChatGPT Plus or Pro subscription +- Browser access for initial OAuth login +""" + +import os + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# First time: Opens browser for OAuth login +# Subsequent calls: Reuses cached credentials (auto-refreshes if expired) +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", # or "gpt-5.2", "gpt-5.1-codex-max", "gpt-5.1-codex-mini" +) + +# Alternative: Force a fresh login (useful if credentials are stale) +# llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex", force_login=True) + +# Alternative: Disable auto-opening browser (prints URL to console instead) +# llm = LLM.subscription_login( +# vendor="openai", model="gpt-5.2-codex", open_browser=False +# ) + +# Verify subscription mode is active +print(f"Using subscription mode: {llm.is_subscription}") + +# Use the LLM with an agent as usual +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + ], +) + +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) + +conversation.send_message("List the files in the current directory.") +conversation.run() +print("Done!") +``` + + + +## Next Steps + +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations +- **[LLM Streaming](/sdk/guides/llm-streaming)** - Stream responses token-by-token +- **[LLM Reasoning](/sdk/guides/llm-reasoning)** - Access model reasoning traces + +### Model Context Protocol +Source: https://docs.openhands.dev/sdk/guides/mcp.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + + ***MCP*** (Model Context Protocol) is a protocol for exposing tools and resources to AI agents. + Read more about MCP [here](https://modelcontextprotocol.io/). + + + + +## Basic MCP Usage + +> The ready-to-run basic MCP usage example is available [here](#ready-to-run-basic-mcp-usage-example)! + + + + ### MCP Configuration + Configure MCP servers using a dictionary with server names and connection details following [this configuration format](https://gofastmcp.com/clients/client#configuration-format) + + ```python mcp_config icon="python" wrap focus={3-10} + mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "repomix": { + "command": "npx", + "args": ["-y", "repomix@1.4.2", "--mcp"] + }, + } + } + ``` + + + ### Tool Filtering + Use `filter_tools_regex` to control which MCP tools are available to the agent + + ```python filter_tools_regex focus={4-5} icon="python" + agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config, + filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", + ) + ``` + + + +## MCP with OAuth + +> The ready-to-run MCP with OAuth example is available [here](#ready-to-run-mcp-with-oauth-example)! + +For MCP servers requiring OAuth authentication: +- Configure OAuth-enabled MCP servers by specifying the URL and auth type +- The SDK automatically handles the OAuth flow when first connecting +- When the agent first attempts to use an OAuth-protected MCP server's tools, the SDK initiates the OAuth flow via [FastMCP](https://gofastmcp.com/servers/auth/authentication) +- User will be prompted to authenticate via browser +- Access tokens are securely stored in `~/.fastmcp/oauth-mcp-client-cache/` and automatically refreshed by FastMCP as needed + +```python mcp_config focus={5} icon="python" wrap +mcp_config = { + "mcpServers": { + "Notion": { + "url": "https://mcp.notion.com/mcp", + "auth": "oauth" + } + } +} +``` + + +OAuth MCP servers require user interaction for the initial browser-based authentication. This means they are not suitable for fully automated/headless workflows. If you need headless access, check if the MCP provider offers API key authentication as an alternative. + + +## Ready-to-Run Basic MCP Usage Example + + +This example is available on GitHub: [examples/01_standalone_sdk/07_mcp_integration.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py) + + +Here's an example integrating MCP servers with an agent: + +```python icon="python" expandable examples/01_standalone_sdk/07_mcp_integration.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Add MCP Tools +mcp_config = { + "mcpServers": { + "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, + "repomix": {"command": "npx", "args": ["-y", "repomix@1.4.2", "--mcp"]}, + } +} +# Agent +agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config, + # This regex filters out all repomix tools except pack_codebase + filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", +) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, +) +conversation.set_security_analyzer(LLMSecurityAnalyzer()) + +logger.info("Starting conversation with MCP integration...") +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands and write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Ready-to-Run MCP with OAuth Example + + +This example is available on GitHub: [examples/01_standalone_sdk/08_mcp_with_oauth.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/08_mcp_with_oauth.py) + + +```python icon="python" expandable examples/01_standalone_sdk/08_mcp_with_oauth.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +mcp_config = { + "mcpServers": {"Notion": {"url": "https://mcp.notion.com/mcp", "auth": "oauth"}} +} +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], +) + +logger.info("Starting conversation with MCP integration...") +conversation.send_message("Can you search about OpenHands V1 in my notion workspace?") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Creating native SDK tools +- **[Security Analyzer](/sdk/guides/security)** - Securing tool usage +- **[MCP Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp)** - MCP integration implementation + +### Metrics Tracking +Source: https://docs.openhands.dev/sdk/guides/metrics.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +## Overview + +The OpenHands SDK provides metrics tracking at two levels: individual LLM metrics and aggregated conversation-level costs: +- You can access detailed metrics from each LLM instance using the `llm.metrics` object to track token usage, costs, and latencies per API call. +- For a complete view, use `conversation.conversation_stats` to get aggregated costs across all LLMs used in a conversation, including the primary agent LLM and any auxiliary LLMs (such as those used by the [context condenser](/sdk/guides/context-condenser)). + +## Getting Metrics from Individual LLMs + +> A ready-to-run example is available [here](#ready-to-run-example-llm-metrics)! + +Track token usage, costs, and performance metrics from LLM interactions: + +### Accessing Individual LLM Metrics + +Access metrics directly from the LLM object after running the conversation: + +```python icon="python" focus={3-4} +conversation.run() + +assert llm.metrics is not None +print(f"Final LLM metrics: {llm.metrics.model_dump()}") +``` + +The `llm.metrics` object is an instance of the [Metrics class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py), which provides detailed information including: + +- `accumulated_cost` - Total accumulated cost across all API calls +- `accumulated_token_usage` - Aggregated token usage with fields like: + - `prompt_tokens` - Number of input tokens processed + - `completion_tokens` - Number of output tokens generated + - `cache_read_tokens` - Cache hits (if supported by the model) + - `cache_write_tokens` - Cache writes (if supported by the model) + - `reasoning_tokens` - Reasoning tokens (for models that support extended thinking) + - `context_window` - Context window size used +- `costs` - List of individual cost records per API call +- `token_usages` - List of detailed token usage records per API call +- `response_latencies` - List of response latency metrics per API call + + + For more details on the available metrics and methods, refer to the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py). + + +### Ready-to-run Example (LLM metrics) + +This example is available on GitHub: [examples/01_standalone_sdk/13_get_llm_metrics.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py) + + +```python icon="python" expandable examples/01_standalone_sdk/13_get_llm_metrics.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Add MCP Tools +mcp_config = {"mcpServers": {"fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}}} + +# Agent +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, +) + +logger.info("Starting conversation with MCP integration...") +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands and write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +assert llm.metrics is not None +print( + f"Conversation finished. Final LLM metrics with details: {llm.metrics.model_dump()}" +) + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Using LLM Registry for Cost Tracking + +> A ready-to-run example is available [here](#ready-to-run-example-llm-registry)! + +The [LLM Registry](/sdk/guides/llm-registry) allows you to maintain a centralized registry of LLM instances, each identified by a unique `usage_id`. This is particularly useful for tracking costs across different LLMs used in your application. + +### How the LLM Registry Works + +Each LLM is created with a unique `usage_id` (e.g., "agent", "condenser") that serves as its identifier in the registry. The registry maintains references to all LLM instances, allowing you to: + +1. **Register LLMs**: Add LLM instances to the registry with `llm_registry.add(llm)` +2. **Retrieve LLMs**: Get LLM instances by their usage ID with `llm_registry.get("usage_id")` +3. **List Usage IDs**: View all registered usage IDs with `llm_registry.list_usage_ids()` +4. **Track Costs Separately**: Each LLM's metrics are tracked independently by its usage ID + +This pattern is essential when using multiple LLMs in your application, such as having a primary agent LLM and a separate LLM for context condensing. + +### Ready-to-run Example (LLM Registry) + +This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) + + + +```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + LLMRegistry, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Create LLM registry and add the LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) + +# Get LLM from registry +llm = llm_registry.get("agent") + +# Tools +cwd = os.getcwd() +tools = [Tool(name=TerminalTool.name)] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message("Please echo 'Hello!'") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +print("=" * 100) +print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") + +# Demonstrate getting the same LLM instance from registry +same_llm = llm_registry.get("agent") +print(f"Same LLM instance: {llm is same_llm}") + +# Demonstrate requesting a completion directly from an LLM +resp = llm.completion( + messages=[ + Message(role="user", content=[TextContent(text="Say hello in one word.")]) + ] +) +# Access the response content via OpenHands LLMResponse +msg = resp.message +texts = [c.text for c in msg.content if isinstance(c, TextContent)] +print(f"Direct completion response: {texts[0] if texts else str(msg)}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + +### Getting Aggregated Conversation Costs + + +This example is available on GitHub: [examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py) + + +Beyond individual LLM metrics, you can access aggregated costs for an entire conversation using `conversation.conversation_stats`. This is particularly useful when your conversation involves multiple LLMs, such as the main agent LLM and auxiliary LLMs for tasks like context condensing. + +```python icon="python" expandable examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py +import os + +from pydantic import SecretStr +from tabulate import tabulate + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + LLMSummarizingCondenser, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool.spec import Tool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +llm_condenser = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="condenser", +) + +# Tools +condenser = LLMSummarizingCondenser(llm=llm_condenser, max_size=10, keep_first=2) + +cwd = os.getcwd() +agent = Agent( + llm=llm, + tools=[ + Tool( + name=TerminalTool.name, + ), + ], + condenser=condenser, +) + +conversation = Conversation(agent=agent, workspace=cwd) +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text="Please echo 'Hello!'")], + ) +) +conversation.run() + +# Demonstrate extraneous costs part of the conversation +second_llm = LLM( + usage_id="demo-secondary", + model=model, + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) +conversation.llm_registry.add(second_llm) +completion_response = second_llm.completion( + messages=[Message(role="user", content=[TextContent(text="echo 'More spend!'")])] +) + +# Access total spend +spend = conversation.conversation_stats.get_combined_metrics() +print("\n=== Total Spend for Conversation ===\n") +print(f"Accumulated Cost: ${spend.accumulated_cost:.6f}") +if spend.accumulated_token_usage: + print(f"Prompt Tokens: {spend.accumulated_token_usage.prompt_tokens}") + print(f"Completion Tokens: {spend.accumulated_token_usage.completion_tokens}") + print(f"Cache Read Tokens: {spend.accumulated_token_usage.cache_read_tokens}") + print(f"Cache Write Tokens: {spend.accumulated_token_usage.cache_write_tokens}") + +spend_per_usage = conversation.conversation_stats.usage_to_metrics +print("\n=== Spend Breakdown by Usage ID ===\n") +rows = [] +for usage_id, metrics in spend_per_usage.items(): + rows.append( + [ + usage_id, + f"${metrics.accumulated_cost:.6f}", + metrics.accumulated_token_usage.prompt_tokens + if metrics.accumulated_token_usage + else 0, + metrics.accumulated_token_usage.completion_tokens + if metrics.accumulated_token_usage + else 0, + ] + ) + +print( + tabulate( + rows, + headers=["Usage ID", "Cost", "Prompt Tokens", "Completion Tokens"], + tablefmt="github", + ) +) + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +### Understanding Conversation Stats + +The `conversation.conversation_stats` object provides cost tracking across all LLMs used in a conversation. It is an instance of the [ConversationStats class](https://github.com/OpenHands/software-agent-sdk/blob/32e1e75f7e962033a8fd6773a672612e07bc8c0d/openhands-sdk/openhands/sdk/conversation/conversation_stats.py), which provides the following key features: + +#### Key Methods and Properties + +- **`usage_to_metrics`**: A dictionary mapping usage IDs to their respective `Metrics` objects. This allows you to track costs separately for each LLM used in the conversation. + +- **`get_combined_metrics()`**: Returns a single `Metrics` object that aggregates costs across all LLMs used in the conversation. This gives you the total cost of the entire conversation. + +- **`get_metrics_for_usage(usage_id: str)`**: Retrieves the `Metrics` object for a specific usage ID, allowing you to inspect costs for individual LLMs. + +```python icon="python" focus={2, 6, 10} +# Get combined metrics for the entire conversation +total_metrics = conversation.conversation_stats.get_combined_metrics() +print(f"Total cost: ${total_metrics.accumulated_cost:.6f}") + +# Get metrics for a specific LLM by usage ID +agent_metrics = conversation.conversation_stats.get_metrics_for_usage("agent") +print(f"Agent cost: ${agent_metrics.accumulated_cost:.6f}") + +# Access all usage IDs and their metrics +for usage_id, metrics in conversation.conversation_stats.usage_to_metrics.items(): + print(f"{usage_id}: ${metrics.accumulated_cost:.6f}") +``` + +## Next Steps + +- **[Context Condenser](/sdk/guides/context-condenser)** - Learn about context management and how it uses separate LLMs +- **[LLM Routing](/sdk/guides/llm-routing)** - Optimize costs with smart routing between different models + +### Observability & Tracing +Source: https://docs.openhands.dev/sdk/guides/observability.md + +> A full setup example is available [here](#example:-full-setup)! + +## Overview + +The OpenHands SDK provides built-in OpenTelemetry (OTEL) tracing support, allowing you to monitor and debug your agent's execution in real-time. You can send traces to any OTLP-compatible observability platform including: + +- **[Laminar](https://laminar.sh/)** - AI-focused observability with browser session replay support +- **[Honeycomb](https://www.honeycomb.io/)** - High-performance distributed tracing +- **Any OTLP-compatible backend** - Including Jaeger, Datadog, New Relic, and more + +The SDK automatically traces: +- Agent execution steps +- Tool calls and executions +- LLM API calls (via LiteLLM integration) +- Browser automation sessions (when using browser-use) +- Conversation lifecycle events + +## Quick Start + +Tracing is automatically enabled when you set the appropriate environment variables. The SDK detects the configuration on startup and initializes tracing without requiring code changes. + +### Using Laminar + +[Laminar](https://laminar.sh/) provides specialized AI observability features including browser session replays when using browser-use tools: + +```bash icon="terminal" wrap +# Set your Laminar project API key +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +``` + +That's it! Run your agent code normally and traces will be sent to Laminar automatically. + +### Using Honeycomb or Other OTLP Backends + +For Honeycomb, Jaeger, or any other OTLP-compatible backend: + +```bash icon="terminal" wrap +# Required: Set the OTLP endpoint +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" + +# Required: Set authentication headers (format: comma-separated key=value pairs, URL-encoded) +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=your-api-key" + +# Recommended: Explicitly set the protocol (most OTLP backends require HTTP) +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" # use "grpc" only if your backend supports it +``` + +### Alternative Configuration Methods + +You can also use these alternative environment variable formats: + +```bash icon="terminal" wrap +# Short form for endpoint +export OTEL_ENDPOINT="http://localhost:4317" + +# Alternative header format +export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer%20" + +# Alternative protocol specification +export OTEL_EXPORTER="otlp_http" # or "otlp_grpc" +``` + +## How It Works + +The OpenHands SDK uses the [Laminar SDK](https://docs.lmnr.ai/) as its OpenTelemetry instrumentation layer. When you set the environment variables, the SDK: + +1. **Detects Configuration**: Checks for OTEL environment variables on startup +2. **Initializes Tracing**: Configures OpenTelemetry with the appropriate exporter +3. **Instruments Code**: Automatically wraps key functions with tracing decorators +4. **Captures Context**: Associates traces with conversation IDs for session grouping +5. **Exports Spans**: Sends trace data to your configured backend + +### What Gets Traced + +The SDK automatically instruments these components: + +- **`agent.step`** - Each iteration of the agent's execution loop +- **Tool Executions** - Individual tool calls with input/output capture +- **LLM Calls** - API requests to language models via LiteLLM +- **Conversation Lifecycle** - Message sending, conversation runs, and title generation +- **Browser Sessions** - When using browser-use, captures session replays (Laminar only) + +### Trace Hierarchy + +Traces are organized hierarchically: + + + + + + + + + + + + + + + +Each conversation gets its own session ID (the conversation UUID), allowing you to group all traces from a single +conversation together in your observability platform. + +Note that in `tool.execute` the tool calls are traced, e.g., `bash`, `file_editor`. + +## Configuration Reference + +### Environment Variables + +The SDK checks for these environment variables (in order of precedence): + +| Variable | Description | Example | +|----------|-------------|---------| +| `LMNR_PROJECT_API_KEY` | Laminar project API key | `your-laminar-api-key` | +| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | Full OTLP traces endpoint URL | `https://api.honeycomb.io:443/v1/traces` | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | Base OTLP endpoint (traces path appended) | `http://localhost:4317` | +| `OTEL_ENDPOINT` | Short form endpoint | `http://localhost:4317` | +| `OTEL_EXPORTER_OTLP_TRACES_HEADERS` | Authentication headers for traces | `x-honeycomb-team=YOUR_API_KEY` | +| `OTEL_EXPORTER_OTLP_HEADERS` | General authentication headers | `Authorization=Bearer%20TOKEN` | +| `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` | Protocol for traces endpoint | `http/protobuf`, `grpc` | +| `OTEL_EXPORTER` | Short form protocol | `otlp_http`, `otlp_grpc` | + +### Header Format + +Headers should be comma-separated `key=value` pairs with URL encoding for special characters: + +```bash icon="terminal" wrap +# Single header +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=abc123" + +# Multiple headers +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20abc123,X-Custom-Header=value" +``` + +### Protocol Options + +The SDK supports both HTTP and gRPC protocols: + +- **`http/protobuf`** or **`otlp_http`** - HTTP with protobuf encoding (recommended for most backends) +- **`grpc`** or **`otlp_grpc`** - gRPC with protobuf encoding (use only if your backend supports gRPC) + +## Platform-Specific Configuration + +### Laminar Setup + +1. Sign up at [laminar.sh](https://laminar.sh/) +2. Create a project and copy your API key +3. Set the environment variable: + +```bash icon="terminal" wrap +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +``` + +**Browser Session Replay**: When using Laminar with browser-use tools, session replays are automatically captured, allowing you to see exactly what the browser automation did. + +### Honeycomb Setup + +1. Sign up at [honeycomb.io](https://www.honeycomb.io/) +2. Get your API key from the account settings +3. Configure the environment: + +```bash icon="terminal" wrap +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=YOUR_API_KEY" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" +``` + +### Jaeger Setup + +For local development with Jaeger: + +```bash icon="terminal" wrap +# Start Jaeger all-in-one container +docker run -d --name jaeger \ + -p 4317:4317 \ + -p 16686:16686 \ + jaegertracing/all-in-one:latest + +# Configure SDK +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://localhost:4317" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="grpc" +``` + +Access the Jaeger UI at http://localhost:16686 + +### Generic OTLP Collector + +For other backends, use their OTLP endpoint: + +```bash icon="terminal" wrap +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://your-otlp-collector:4317/v1/traces" +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20YOUR_TOKEN" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" +``` + +## Advanced Usage + +### Disabling Observability + +To disable tracing, simply unset all OTEL environment variables: + +```bash icon="terminal" wrap +unset LMNR_PROJECT_API_KEY +unset OTEL_EXPORTER_OTLP_TRACES_ENDPOINT +unset OTEL_EXPORTER_OTLP_ENDPOINT +unset OTEL_ENDPOINT +``` + +The SDK will automatically skip all tracing instrumentation with minimal overhead. + +### Custom Span Attributes + +The SDK automatically adds these attributes to spans: + +- **`conversation_id`** - UUID of the conversation +- **`tool_name`** - Name of the tool being executed +- **`action.kind`** - Type of action being performed +- **`session_id`** - Groups all traces from one conversation + +### Debugging Tracing Issues + +If traces aren't appearing in your observability platform: + +1. **Verify Environment Variables**: + ```python icon="python" wrap + import os + + otel_endpoint = os.getenv('OTEL_EXPORTER_OTLP_TRACES_ENDPOINT') + otel_headers = os.getenv('OTEL_EXPORTER_OTLP_TRACES_HEADERS') + + print(f"OTEL Endpoint: {otel_endpoint}") + print(f"OTEL Headers: {otel_headers}") + ``` + +2. **Check SDK Logs**: The SDK logs observability initialization at debug level: + ```python icon="python" wrap + import logging + + logging.basicConfig(level=logging.DEBUG) + ``` + +3. **Test Connectivity**: Ensure your application can reach the OTLP endpoint: + ```bash icon="terminal" wrap + curl -v https://api.honeycomb.io:443/v1/traces + ``` + +4. **Validate Headers**: Check that authentication headers are properly URL-encoded + +## Troubleshooting + +### Traces Not Appearing + +**Problem**: No traces showing up in observability platform + +**Solutions**: +- Verify environment variables are set correctly +- Check network connectivity to OTLP endpoint +- Ensure authentication headers are valid +- Look for SDK initialization logs at debug level + +### High Trace Volume + +**Problem**: Too many spans being generated + +**Solutions**: +- Configure sampling at the collector level +- For Laminar with non-browser tools, browser instrumentation is automatically disabled +- Use backend-specific filtering rules + +### Performance Impact + +**Problem**: Concerned about tracing overhead + +**Solutions**: +- Tracing has minimal overhead when properly configured +- Disable tracing in development by unsetting environment variables +- Use asynchronous exporters (default in most OTLP configurations) + +## Example: Full Setup + + +This example is available on GitHub: [examples/01_standalone_sdk/27_observability_laminar.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/27_observability_laminar.py) + + +```python icon="python" expandable examples/01_standalone_sdk/27_observability_laminar.py +""" +Observability & Laminar example + +This example demonstrates enabling OpenTelemetry tracing with Laminar in the +OpenHands SDK. Set LMNR_PROJECT_API_KEY and run the script to see traces. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.terminal import TerminalTool + + +# Tip: Set LMNR_PROJECT_API_KEY in your environment before running, e.g.: +# export LMNR_PROJECT_API_KEY="your-laminar-api-key" +# For non-Laminar OTLP backends, set OTEL_* variables instead. + +# Configure LLM and Agent +api_key = os.getenv("LLM_API_KEY") +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key) if api_key else None, + base_url=base_url, + usage_id="agent", +) + +agent = Agent( + llm=llm, + tools=[Tool(name=TerminalTool.name)], +) + +# Create conversation and run a simple task +conversation = Conversation(agent=agent, workspace=".") +conversation.send_message("List the files in the current directory and print them.") +conversation.run() +print( + "All done! Check your Laminar dashboard for traces " + "(session is the conversation UUID)." +) +``` + +```bash Running the Example +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +cd software-agent-sdk +uv run python examples/01_standalone_sdk/27_observability_laminar.py +``` + +## Next Steps + +- **[Metrics Tracking](/sdk/guides/metrics)** - Monitor token usage and costs alongside traces +- **[LLM Registry](/sdk/guides/llm-registry)** - Track multiple LLMs used in your application +- **[Security](/sdk/guides/security)** - Add security validation to your traced agent executions + +### Plugins +Source: https://docs.openhands.dev/sdk/guides/plugins.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +Plugins provide a way to package and distribute multiple agent components together. A single plugin can include: + +- **Skills**: Specialized knowledge and workflows +- **Hooks**: Event handlers for tool lifecycle +- **MCP Config**: External tool server configurations +- **Agents**: Specialized agent definitions +- **Commands**: Slash commands + +The plugin format is compatible with the [Claude Code plugin structure](https://github.com/anthropics/claude-code/tree/main/plugins). + +## Plugin Structure + + +See the [example_plugins directory](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/05_skills_and_plugins/02_loading_plugins/example_plugins) for a complete working plugin structure. + + +A plugin follows this directory structure: + + + + + + + + + + + + + + + + + + + + + + + + + +Note that the plugin metadata, i.e., `plugin-name/.plugin/plugin.json`, is required. + +### Plugin Manifest + +The manifest file `plugin-name/.plugin/plugin.json` defines plugin metadata: + +```json icon="file-code" wrap +{ + "name": "code-quality", + "version": "1.0.0", + "description": "Code quality tools and workflows", + "author": "openhands", + "license": "MIT", + "repository": "https://github.com/example/code-quality-plugin" +} +``` + +### Skills + +Skills are defined in markdown files with YAML frontmatter: + +```markdown icon="file-code" +--- +name: python-linting +description: Instructions for linting Python code +trigger: + type: keyword + keywords: + - lint + - linting + - code quality +--- + +# Python Linting Skill + +Run ruff to check for issues: + +\`\`\`bash +ruff check . +\`\`\` +``` + +### Hooks + +Hooks are defined in `hooks/hooks.json`: + +```json icon="file-code" wrap +{ + "hooks": { + "PostToolUse": [ + { + "matcher": "file_editor", + "hooks": [ + { + "type": "command", + "command": "echo 'File edited: $OPENHANDS_TOOL_NAME'", + "timeout": 5 + } + ] + } + ] + } +} +``` + +### MCP Configuration + +MCP servers are configured in `.mcp.json`: + +```json wrap icon="file-code" +{ + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + } + } +} +``` + +## Using Plugin Components + +> The ready-to-run example is available [here](#ready-to-run-example)! + +Brief explanation on how to use a plugin with an agent. + + + + ### Loading a Plugin + First, load the desired plugins. + + ```python icon="python" + from openhands.sdk.plugin import Plugin + + # Load a single plugin + plugin = Plugin.load("/path/to/plugin") + + # Load all plugins from a directory + plugins = Plugin.load_all("/path/to/plugins") + ``` + + + ### Accessing Components + You can access the different plugin components to see which ones are available. + + ```python icon="python" + # Skills + for skill in plugin.skills: + print(f"Skill: {skill.name}") + + # Hooks configuration + if plugin.hooks: + print(f"Hooks configured: {plugin.hooks}") + + # MCP servers + if plugin.mcp_config: + servers = plugin.mcp_config.get("mcpServers", {}) + print(f"MCP servers: {list(servers.keys())}") + ``` + + + ### Using with an Agent + You can now feed your agent with your preferred plugin. + + ```python focus={3,10,17} icon="python" + # Create agent context with plugin skills + agent_context = AgentContext( + skills=plugin.skills, + ) + + # Create agent with plugin MCP config + agent = Agent( + llm=llm, + tools=tools, + mcp_config=plugin.mcp_config or {}, + agent_context=agent_context, + ) + + # Create conversation with plugin hooks + conversation = Conversation( + agent=agent, + hook_config=plugin.hooks, + ) + ``` + + + +## Ready-to-run Example + +The example below demonstrates plugin loading via Conversation and plugin management utilities (install, list, update, uninstall). + + +This example is available on GitHub: [examples/05_skills_and_plugins/02_loading_plugins/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/02_loading_plugins/main.py) + + +```python icon="python" expandable examples/05_skills_and_plugins/02_loading_plugins/main.py +"""Example: Loading and Managing Plugins + +This example demonstrates plugin loading and management in the SDK: + +1. Loading plugins via Conversation (PluginSource) +2. Installing plugins to persistent storage +3. Listing, updating, and uninstalling plugins + +Plugins bundle skills, hooks, and MCP config together. + +Supported plugin sources: +- Local path: /path/to/plugin +- GitHub shorthand: github:owner/repo +- Git URL: https://github.com/owner/repo.git +- With ref: branch, tag, or commit SHA +- With repo_path: subdirectory for monorepos + +For full documentation, see: https://docs.all-hands.dev/sdk/guides/plugins +""" + +import os +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation +from openhands.sdk.plugin import ( + PluginFetchError, + PluginSource, + install_plugin, + list_installed_plugins, + load_installed_plugins, + uninstall_plugin, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Locate example plugin directory +script_dir = Path(__file__).parent +local_plugin_path = script_dir / "example_plugins" / "code-quality" + + +def demo_conversation_with_plugins(llm: LLM) -> None: + """Demo 1: Load plugins via Conversation's plugins parameter. + + This is the recommended way to use plugins - they are loaded lazily + when the conversation starts. + """ + print("\n" + "=" * 60) + print("DEMO 1: Loading plugins via Conversation") + print("=" * 60) + + # Define plugins to load + plugins = [ + PluginSource(source=str(local_plugin_path)), + # Examples of other sources: + # PluginSource(source="github:owner/repo", ref="v1.0.0"), + # PluginSource(source="github:owner/monorepo", repo_path="plugins/my-plugin"), + ] + + agent = Agent( + llm=llm, + tools=[Tool(name=TerminalTool.name), Tool(name=FileEditorTool.name)], + ) + + with tempfile.TemporaryDirectory() as tmpdir: + conversation = Conversation( + agent=agent, + workspace=tmpdir, + plugins=plugins, + ) + + # The "lint" keyword triggers the python-linting skill + conversation.send_message("How do I lint Python code? Brief answer please.") + + # Verify skills were loaded + skills = ( + conversation.agent.agent_context.skills + if conversation.agent.agent_context + else [] + ) + print(f"✓ Loaded {len(skills)} skill(s) from plugins") + + conversation.run() + + +def demo_install_local_plugin(installed_dir: Path) -> None: + """Demo 2: Install a plugin from a local path. + + Useful for development or local-only plugins. + """ + print("\n" + "=" * 60) + print("DEMO 2: Installing plugin from local path") + print("=" * 60) + + info = install_plugin(source=str(local_plugin_path), installed_dir=installed_dir) + print(f"✓ Installed: {info.name} v{info.version}") + print(f" Source: {info.source}") + print(f" Path: {info.install_path}") + + +def demo_install_github_plugin(installed_dir: Path) -> None: + """Demo 3: Install a plugin from GitHub. + + Demonstrates the github:owner/repo shorthand with repo_path for monorepos. + """ + print("\n" + "=" * 60) + print("DEMO 3: Installing plugin from GitHub") + print("=" * 60) + + try: + # Install from anthropics/skills repository + info = install_plugin( + source="github:anthropics/skills", + repo_path="skills/pptx", + ref="main", + installed_dir=installed_dir, + ) + print(f"✓ Installed: {info.name} v{info.version}") + print(f" Source: {info.source}") + print(f" Resolved ref: {info.resolved_ref}") + + except PluginFetchError as e: + print(f"⚠ Could not fetch from GitHub: {e}") + print(" (Network or rate limiting issue)") + + +def demo_list_and_load_plugins(installed_dir: Path) -> None: + """Demo 4: List and load installed plugins.""" + print("\n" + "=" * 60) + print("DEMO 4: List and load installed plugins") + print("=" * 60) + + # List installed plugins + print("Installed plugins:") + for info in list_installed_plugins(installed_dir=installed_dir): + print(f" - {info.name} v{info.version} ({info.source})") + + # Load plugins as Plugin objects + plugins = load_installed_plugins(installed_dir=installed_dir) + print(f"\nLoaded {len(plugins)} plugin(s):") + for plugin in plugins: + skills = plugin.get_all_skills() + print(f" - {plugin.name}: {len(skills)} skill(s)") + + +def demo_uninstall_plugins(installed_dir: Path) -> None: + """Demo 5: Uninstall plugins.""" + print("\n" + "=" * 60) + print("DEMO 5: Uninstalling plugins") + print("=" * 60) + + for info in list_installed_plugins(installed_dir=installed_dir): + uninstall_plugin(info.name, installed_dir=installed_dir) + print(f"✓ Uninstalled: {info.name}") + + remaining = list_installed_plugins(installed_dir=installed_dir) + print(f"\nRemaining plugins: {len(remaining)}") + + +# Main execution +if __name__ == "__main__": + api_key = os.getenv("LLM_API_KEY") + if not api_key: + print("Set LLM_API_KEY to run the full example") + print("Running install/uninstall demos only...") + llm = None + else: + model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") + llm = LLM( + usage_id="plugin-demo", + model=model, + api_key=SecretStr(api_key), + base_url=os.getenv("LLM_BASE_URL"), + ) + + with tempfile.TemporaryDirectory() as tmpdir: + installed_dir = Path(tmpdir) / "installed" + installed_dir.mkdir() + + # Demo 1: Conversation with plugins (requires LLM) + if llm: + demo_conversation_with_plugins(llm) + + # Demo 2-5: Plugin management (no LLM required) + demo_install_local_plugin(installed_dir) + demo_install_github_plugin(installed_dir) + demo_list_and_load_plugins(installed_dir) + demo_uninstall_plugins(installed_dir) + + print("\n" + "=" * 60) + print("EXAMPLE COMPLETED SUCCESSFULLY") + print("=" * 60) + + if llm: + print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") + else: + print("EXAMPLE_COST: 0") +``` + + + +## Installing Plugins to Persistent Storage + +The SDK provides utilities to install plugins to a local directory (`~/.openhands/plugins/installed/` by default). Installed plugins are tracked in `.installed.json`, which stores metadata including a persistent enabled flag. + +Use `list_installed_plugins()` to see all tracked plugins (enabled and disabled). Use `load_installed_plugins()` to load only enabled plugins. Toggle plugins on/off with `enable_plugin()` and `disable_plugin()` without uninstalling. + +```python icon="python" +from openhands.sdk.plugin import ( + disable_plugin, + enable_plugin, + install_plugin, + list_installed_plugins, + load_installed_plugins, + uninstall_plugin, +) + +# Install from local path or GitHub +install_plugin(source="/path/to/plugin") +install_plugin(source="github:owner/repo", ref="v1.0.0") + +# List installed plugins (includes enabled + disabled) +for info in list_installed_plugins(): + status = "enabled" if info.enabled else "disabled" + print(f"{info.name} v{info.version} ({status})") + +# Disable a plugin (won't be loaded until re-enabled) +disable_plugin("plugin-name") + +# Load only enabled plugins for your agent +plugins = load_installed_plugins() + +# Later: re-enable and reload +enable_plugin("plugin-name") +plugins = load_installed_plugins() + +# Uninstall +uninstall_plugin("plugin-name") +``` + +## Next Steps + +- **[Skills](/sdk/guides/skill)** - Learn more about skills and triggers +- **[Hooks](/sdk/guides/hooks)** - Understand hook event types +- **[MCP Integration](/sdk/guides/mcp)** - Configure external tool servers + +### Secret Registry +Source: https://docs.openhands.dev/sdk/guides/secrets.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The Secret Registry provides a secure way to handle sensitive data in your agent's workspace. +It automatically detects secret references in bash commands, injects them as environment variables when needed, +and masks secret values in command outputs to prevent accidental exposure. + +### Injecting Secrets + +Use the `update_secrets()` method to add secrets to your conversation. + + +Secrets can be provided as static strings or as callable functions that dynamically retrieve values, enabling integration with external secret stores and credential management systems: + +```python focus={4,11} icon="python" wrap +from openhands.sdk.conversation.secret_source import SecretSource + +# Static secret +conversation.update_secrets({"SECRET_TOKEN": "my-secret-token-value"}) + +# Dynamic secret using SecretSource +class MySecretSource(SecretSource): + def get_value(self) -> str: + return "callable-based-secret" + +conversation.update_secrets({"SECRET_FUNCTION_TOKEN": MySecretSource()}) +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/12_custom_secrets.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py) + + +```python icon="python" expandable examples/01_standalone_sdk/12_custom_secrets.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.secret import SecretSource +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent) + + +class MySecretSource(SecretSource): + def get_value(self) -> str: + return "callable-based-secret" + + +conversation.update_secrets( + {"SECRET_TOKEN": "my-secret-token-value", "SECRET_FUNCTION_TOKEN": MySecretSource()} +) + +conversation.send_message("just echo $SECRET_TOKEN") + +conversation.run() + +conversation.send_message("just echo $SECRET_FUNCTION_TOKEN") + +conversation.run() + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[MCP Integration](/sdk/guides/mcp)** - Connect to MCP +- **[Security Analyzer](/sdk/guides/security)** - Add security validation + +### Security & Action Confirmation +Source: https://docs.openhands.dev/sdk/guides/security.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +Agent actions can be controlled through two complementary mechanisms: **confirmation policy** that determine when user +approval is required, and **security analyzer** that evaluates action risk levels. Together, they provide flexible control over agent behavior while maintaining safety. + +## Confirmation Policy +> A ready-to-run example is available [here](#ready-to-run-example-confirmation)! + +Confirmation policy controls whether actions require user approval before execution. They provide a simple way to ensure safe agent operation by requiring explicit permission for actions. + +### Setting Confirmation Policy + +Set the confirmation policy on your conversation: + +```python icon="python" focus={4} +from openhands.sdk.security.confirmation_policy import AlwaysConfirm + +conversation = Conversation(agent=agent, workspace=".") +conversation.set_confirmation_policy(AlwaysConfirm()) +``` + +Available policies: +- **`AlwaysConfirm()`** - Require approval for all actions +- **`NeverConfirm()`** - Execute all actions without approval +- **`ConfirmRisky()`** - Only require approval for risky actions (requires security analyzer) + +### Custom Confirmation Handler + +Implement your approval logic by checking conversation status: + +```python icon="python" focus={2-3,5} +while conversation.state.agent_status != AgentExecutionStatus.FINISHED: + if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not confirm_in_console(pending): + conversation.reject_pending_actions("User rejected") + continue + conversation.run() +``` + +### Rejecting Actions + +Provide feedback when rejecting to help the agent try a different approach: + +```python icon="python" focus={2-5} +if not user_approved: + conversation.reject_pending_actions( + "User rejected because actions seem too risky." + "Please try a safer approach." + ) +``` + +### Ready-to-run Example Confirmation + + +Full confirmation example: [examples/01_standalone_sdk/04_confirmation_mode_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/04_confirmation_mode_example.py) + + +Require user approval before executing agent actions: + +```python icon="python" expandable examples/01_standalone_sdk/04_confirmation_mode_example.py +"""OpenHands Agent SDK — Confirmation Mode Example""" + +import os +import signal +from collections.abc import Callable + +from pydantic import SecretStr + +from openhands.sdk import LLM, BaseConversation, Conversation +from openhands.sdk.conversation.state import ( + ConversationExecutionStatus, + ConversationState, +) +from openhands.sdk.security.confirmation_policy import AlwaysConfirm, NeverConfirm +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.tools.preset.default import get_default_agent + + +# Make ^C a clean exit instead of a stack trace +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + + +def _print_action_preview(pending_actions) -> None: + print(f"\n🔍 Agent created {len(pending_actions)} action(s) awaiting confirmation:") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") + + +def confirm_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Default to 'no' on EOF/KeyboardInterrupt (matches original behavior). + """ + _print_action_preview(pending_actions) + while True: + try: + ans = ( + input("\nDo you want to execute these actions? (yes/no): ") + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\n❌ No input received; rejecting by default.") + return False + + if ans in ("yes", "y"): + print("✅ Approved — executing actions…") + return True + if ans in ("no", "n"): + print("❌ Rejected — skipping actions…") + return False + print("Please enter 'yes' or 'no'.") + + +def run_until_finished(conversation: BaseConversation, confirmer: Callable) -> None: + """ + Drive the conversation until FINISHED. + If WAITING_FOR_CONFIRMATION, ask the confirmer; + on reject, call reject_pending_actions(). + Preserves original error if agent waits but no actions exist. + """ + while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + if ( + conversation.state.execution_status + == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "⚠️ Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected the actions") + # Let the agent produce a new step or finish + continue + + print("▶️ Running conversation.run()…") + conversation.run() + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +agent = get_default_agent(llm=llm) +conversation = Conversation(agent=agent, workspace=os.getcwd()) + +# Conditionally add security analyzer based on environment variable +add_security_analyzer = bool(os.getenv("ADD_SECURITY_ANALYZER", "").strip()) +if add_security_analyzer: + print("Agent security analyzer added.") + conversation.set_security_analyzer(LLMSecurityAnalyzer()) + +# 1) Confirmation mode ON +conversation.set_confirmation_policy(AlwaysConfirm()) +print("\n1) Command that will likely create actions…") +conversation.send_message("Please list the files in the current directory using ls -la") +run_until_finished(conversation, confirm_in_console) + +# 2) A command the user may choose to reject +print("\n2) Command the user may choose to reject…") +conversation.send_message("Please create a file called 'dangerous_file.txt'") +run_until_finished(conversation, confirm_in_console) + +# 3) Simple greeting (no actions expected) +print("\n3) Simple greeting (no actions expected)…") +conversation.send_message("Just say hello to me") +run_until_finished(conversation, confirm_in_console) + +# 4) Disable confirmation mode and run commands directly +print("\n4) Disable confirmation mode and run a command…") +conversation.set_confirmation_policy(NeverConfirm()) +conversation.send_message("Please echo 'Hello from confirmation mode example!'") +conversation.run() + +conversation.send_message( + "Please delete any file that was created during this conversation." +) +conversation.run() + +print("\n=== Example Complete ===") +print("Key points:") +print( + "- conversation.run() creates actions; confirmation mode " + "sets execution_status=WAITING_FOR_CONFIRMATION" +) +print("- User confirmation is handled via a single reusable function") +print("- Rejection uses conversation.reject_pending_actions() and the loop continues") +print("- Simple responses work normally without actions") +print("- Confirmation policy is toggled with conversation.set_confirmation_policy()") +``` + + + +--- + +## Security Analyzer + +Security analyzer evaluates the risk of agent actions before execution, helping protect against potentially dangerous operations. They analyze each action and assign a security risk level: + +- **LOW** - Safe operations with minimal security impact +- **MEDIUM** - Moderate security impact, review recommended +- **HIGH** - Significant security impact, requires confirmation +- **UNKNOWN** - Risk level could not be determined + +Security analyzer work in conjunction with confirmation policy (like `ConfirmRisky()`) to determine whether user approval is needed before executing an action. This provides an additional layer of safety for autonomous agent operations. + +### LLM Security Analyzer + +> A ready-to-run example is available [here](#ready-to-run-example-security-analyzer)! + +The **LLMSecurityAnalyzer** is the default implementation provided in the agent-sdk. It leverages the LLM's understanding of action context to provide lightweight security analysis. The LLM can annotate actions with security risk levels during generation, which the analyzer then uses to make security decisions. + +#### Security Analyzer Configuration + +Create an LLM-based security analyzer to review actions before execution: + +```python icon="python" focus={9} +from openhands.sdk import LLM +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +llm = LLM( + usage_id="security-analyzer", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +security_analyzer = LLMSecurityAnalyzer(llm=security_llm) +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) +``` + +The security analyzer: +- Reviews each action before execution +- Flags potentially dangerous operations +- Can be configured with custom security policy +- Uses a separate LLM to avoid conflicts with the main agent + +#### Ready-to-run Example Security Analyzer + + +Full security analyzer example: [examples/01_standalone_sdk/16_llm_security_analyzer.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/16_llm_security_analyzer.py) + + +Automatically analyze agent actions for security risks before execution: + +```python icon="python" expandable examples/01_standalone_sdk/16_llm_security_analyzer.py +"""OpenHands Agent SDK — LLM Security Analyzer Example (Simplified) + +This example shows how to use the LLMSecurityAnalyzer to automatically +evaluate security risks of actions before execution. +""" + +import os +import signal +from collections.abc import Callable + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, BaseConversation, Conversation +from openhands.sdk.conversation.state import ( + ConversationExecutionStatus, + ConversationState, +) +from openhands.sdk.security.confirmation_policy import ConfirmRisky +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Clean ^C exit: no stack trace noise +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + + +def _print_blocked_actions(pending_actions) -> None: + print(f"\n🔒 Security analyzer blocked {len(pending_actions)} high-risk action(s):") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") + + +def confirm_high_risk_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Matches original behavior: default to 'no' on EOF/KeyboardInterrupt. + """ + _print_blocked_actions(pending_actions) + while True: + try: + ans = ( + input( + "\nThese actions were flagged as HIGH RISK. " + "Do you want to execute them anyway? (yes/no): " + ) + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\n❌ No input received; rejecting by default.") + return False + + if ans in ("yes", "y"): + print("✅ Approved — executing high-risk actions...") + return True + if ans in ("no", "n"): + print("❌ Rejected — skipping high-risk actions...") + return False + print("Please enter 'yes' or 'no'.") + + +def run_until_finished_with_security( + conversation: BaseConversation, confirmer: Callable[[list], bool] +) -> None: + """ + Drive the conversation until FINISHED. + - If WAITING_FOR_CONFIRMATION: ask the confirmer. + * On approve: set execution_status = IDLE (keeps original example’s behavior). + * On reject: conversation.reject_pending_actions(...). + - If WAITING but no pending actions: print warning and set IDLE (matches original). + """ + while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + if ( + conversation.state.execution_status + == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "⚠️ Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected high-risk actions") + continue + + print("▶️ Running conversation.run()...") + conversation.run() + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="security-analyzer", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +# Conversation with persisted filestore +conversation = Conversation( + agent=agent, persistence_dir="./.conversations", workspace="." +) +conversation.set_security_analyzer(LLMSecurityAnalyzer()) +conversation.set_confirmation_policy(ConfirmRisky()) + +print("\n1) Safe command (LOW risk - should execute automatically)...") +conversation.send_message("List files in the current directory") +conversation.run() + +print("\n2) Potentially risky command (may require confirmation)...") +conversation.send_message( + "Please echo 'hello world' -- PLEASE MARK THIS AS A HIGH RISK ACTION" +) +run_until_finished_with_security(conversation, confirm_high_risk_in_console) +``` + + + +### Custom Security Analyzer Implementation + +You can extend the security analyzer functionality by creating your own implementation that inherits from the [SecurityAnalyzerBase](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py) class. This allows you to implement custom security logic tailored to your specific requirements. + +#### Creating a Custom Analyzer + +To create a custom security analyzer, inherit from `SecurityAnalyzerBase` and implement the `security_risk()` method: + +```python icon="python" focus={5, 8} +from openhands.sdk.security.analyzer import SecurityAnalyzerBase +from openhands.sdk.security.risk import SecurityRisk +from openhands.sdk.event.llm_convertible import ActionEvent + +class CustomSecurityAnalyzer(SecurityAnalyzerBase): + """Custom security analyzer with domain-specific rules.""" + + def security_risk(self, action: ActionEvent) -> SecurityRisk: + """Evaluate security risk based on custom rules. + + Args: + action: The ActionEvent to analyze + + Returns: + SecurityRisk level (LOW, MEDIUM, HIGH, or UNKNOWN) + """ + # Example: Check for specific dangerous patterns + action_str = str(action.action.model_dump()).lower() if action.action else "" + + # High-risk patterns + if any(pattern in action_str for pattern in ['rm -rf', 'sudo', 'chmod 777']): + return SecurityRisk.HIGH + + # Medium-risk patterns + if any(pattern in action_str for pattern in ['curl', 'wget', 'git clone']): + return SecurityRisk.MEDIUM + + # Default to low risk + return SecurityRisk.LOW + +# Use your custom analyzer +security_analyzer = CustomSecurityAnalyzer() +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) +``` + + + For more details on the base class implementation, see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py). + + + +--- + +## Configurable Security Policy + +> A ready-to-run example is available [here](#ready-to-run-example-security-policy)! + +Agents use security policies to guide their risk assessment of actions. The SDK provides a default security policy template, but you can customize it to match your specific security requirements and guidelines. + + +### Using Custom Security Policies + +You can provide a custom security policy template when creating an agent: + +```python focus={9-13} icon="python" +from openhands.sdk import Agent, LLM + +llm = LLM( + usage_id="agent", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), +) + +# Provide a custom security policy template file +agent = Agent( + llm=llm, + tools=tools, + security_policy_filename="my_security_policy.j2", +) +``` + +Custom security policies allow you to: +- Define organization-specific risk assessment guidelines +- Set custom thresholds for security risk levels +- Add domain-specific security rules +- Tailor risk evaluation to your use case + +The security policy is provided as a Jinja2 template that gets rendered into the agent's system prompt, guiding how it evaluates the security risk of its actions. + +### Ready-to-run Example Security Policy + + +Full configurable security policy example: [examples/01_standalone_sdk/32_configurable_security_policy.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/32_configurable_security_policy.py) + + +Define custom security risk guidelines for your agent: + +```python icon="python" expandable examples/01_standalone_sdk/32_configurable_security_policy.py +"""OpenHands Agent SDK — Configurable Security Policy Example + +This example demonstrates how to use a custom security policy template +with an agent. Security policies define risk assessment guidelines that +help agents evaluate the safety of their actions. + +By default, agents use the built-in security_policy.j2 template. This +example shows how to: +1. Use the default security policy +2. Provide a custom security policy template embedded in the script +3. Apply the custom policy to guide agent behavior +""" + +import os +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Define a custom security policy template inline +CUSTOM_SECURITY_POLICY = ( + "# 🔐 Custom Security Risk Policy\n" + "When using tools that support the security_risk parameter, assess the " + "safety risk of your actions:\n" + "\n" + "- **LOW**: Safe read-only actions.\n" + " - Viewing files, calculations, documentation.\n" + "- **MEDIUM**: Moderate container-scoped actions.\n" + " - File modifications, package installations.\n" + "- **HIGH**: Potentially dangerous actions.\n" + " - Network access, system modifications, data exfiltration.\n" + "\n" + "**Custom Rules**\n" + "- Always prioritize user data safety.\n" + "- Escalate to **HIGH** for any external data transmission.\n" +) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Example 1: Agent with default security policy +print("=" * 100) +print("Example 1: Agent with default security policy") +print("=" * 100) +default_agent = Agent(llm=llm, tools=tools) +print(f"Security policy filename: {default_agent.security_policy_filename}") +print("\nDefault security policy is embedded in the agent's system message.") + +# Example 2: Agent with custom security policy +print("\n" + "=" * 100) +print("Example 2: Agent with custom security policy") +print("=" * 100) + +# Create a temporary file for the custom security policy +with tempfile.NamedTemporaryFile( + mode="w", suffix=".j2", delete=False, encoding="utf-8" +) as temp_file: + temp_file.write(CUSTOM_SECURITY_POLICY) + custom_policy_path = temp_file.name + +try: + # Create agent with custom security policy (using absolute path) + custom_agent = Agent( + llm=llm, + tools=tools, + security_policy_filename=custom_policy_path, + ) + print(f"Security policy filename: {custom_agent.security_policy_filename}") + print("\nCustom security policy loaded from temporary file.") + + # Verify the custom policy is in the system message + system_message = custom_agent.static_system_message + if "Custom Security Risk Policy" in system_message: + print("✓ Custom security policy successfully embedded in system message.") + else: + print("✗ Custom security policy not found in system message.") + + # Run a conversation with the custom agent + print("\n" + "=" * 100) + print("Running conversation with custom security policy") + print("=" * 100) + + llm_messages = [] # collect raw LLM messages + + def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + conversation = Conversation( + agent=custom_agent, + callbacks=[conversation_callback], + workspace=".", + ) + + conversation.send_message( + "Please create a simple Python script named hello.py that prints " + "'Hello, World!'. Make sure to follow security best practices." + ) + conversation.run() + + print("\n" + "=" * 100) + print("Conversation finished.") + print(f"Total LLM messages: {len(llm_messages)}") + print("=" * 100) + + # Report cost + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + +finally: + # Clean up temporary file + Path(custom_policy_path).unlink(missing_ok=True) + +print("\n" + "=" * 100) +print("Example Summary") +print("=" * 100) +print("This example demonstrated:") +print("1. Using the default security policy (security_policy.j2)") +print("2. Creating a custom security policy template") +print("3. Applying the custom policy via security_policy_filename parameter") +print("4. Running a conversation with the custom security policy") +print( + "\nYou can customize security policies to match your organization's " + "specific requirements." +) +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Build secure custom tools +- **[Custom Secrets](/sdk/guides/secrets)** - Secure credential management + +### Agent Skills & Context +Source: https://docs.openhands.dev/sdk/guides/skill.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +This guide shows how to implement skills in the SDK. For conceptual overview, see [Skills Overview](/overview/skills). + +OpenHands supports an **extended version** of the [AgentSkills standard](https://agentskills.io/specification) with optional keyword triggers. + +## Context Loading Methods + +| Method | When Content Loads | Use Case | +|--------|-------------------|----------| +| **Always-loaded** | At conversation start | Repository rules, coding standards | +| **Trigger-loaded** | When keywords match | Specialized tasks, domain knowledge | +| **Progressive disclosure** | Agent reads on demand | Large reference docs (AgentSkills) | + +## Always-Loaded Context + +Content that's always in the system prompt. + +### Option 1: `AGENTS.md` (Auto-loaded) + +Place `AGENTS.md` at your repo root - it's loaded automatically. See [Permanent Context](/overview/skills/repo). + +```python icon="python" focus={3, 4} +from openhands.sdk.context.skills import load_project_skills + +# Automatically finds AGENTS.md, CLAUDE.md, GEMINI.md at workspace root +skills = load_project_skills(workspace_dir="/path/to/repo") +agent_context = AgentContext(skills=skills) +``` + +### Option 2: Inline Skill (Code-defined) + +```python icon="python" focus={5-11} +from openhands.sdk import AgentContext +from openhands.sdk.context import Skill + +agent_context = AgentContext( + skills=[ + Skill( + name="code-style", + content="Always use type hints in Python.", + trigger=None, # No trigger = always loaded + ), + ] +) +``` + +## Trigger-Loaded Context + +Content injected when keywords appear in user messages. See [Keyword-Triggered Skills](/overview/skills/keyword). + +```python icon="python" focus={6} +from openhands.sdk.context import Skill, KeywordTrigger + +Skill( + name="encryption-helper", + content="Use the encrypt.sh script to encrypt messages.", + trigger=KeywordTrigger(keywords=["encrypt", "decrypt"]), +) +``` + +When user says "encrypt this", the content is injected into the message: + +```xml icon="file" + +The following information has been included based on a keyword match for "encrypt". +Skill location: /path/to/encryption-helper + +Use the encrypt.sh script to encrypt messages. + +``` + +## Progressive Disclosure (AgentSkills Standard) + +For the agent to trigger skills, use the [AgentSkills standard](https://agentskills.io/specification) `SKILL.md` format. The agent sees a summary and reads full content on demand. + +```python icon="python" +from openhands.sdk.context.skills import load_skills_from_dir + +# Load SKILL.md files from a directory +_, _, agent_skills = load_skills_from_dir("/path/to/skills") +agent_context = AgentContext(skills=list(agent_skills.values())) +``` + +Skills are listed in the system prompt: +```xml icon="file" + + + code-style + Project coding standards. + /path/to/code-style/SKILL.md + + +``` + + +Add `triggers` to a SKILL.md for **both** progressive disclosure AND automatic injection when keywords match. + + +## Managing Installed Skills + +You can install AgentSkills into a persistent directory and list/update them +using `openhands.sdk.skills`. Skills are stored under +`~/.openhands/skills/installed/` with a `.installed.json` metadata file that +records an `enabled` flag. `list_installed_skills` returns all installed skills, +while `load_installed_skills` returns only those with `enabled=true`. + +```python icon="python" +from openhands.sdk.skills import ( + disable_skill, + enable_skill, + install_skill, + list_installed_skills, + load_installed_skills, + uninstall_skill, + update_skill, +) + +# Install from GitHub (supports git URLs, local paths, and repo_path for monorepos) +info = install_skill("github:owner/my-skill", ref="v1.0.0") +print(f"Installed {info.name} from {info.source}") + +# List installed skills +for skill_info in list_installed_skills(): + print(f"{skill_info.name}: {skill_info.description}") + +# Enable/disable controls which skills load (state persisted in .installed.json) + +# Disable a skill temporarily (e.g., while debugging or if it conflicts) +disable_skill("my-skill") + +# Load installed skills for an AgentContext (only enabled skills load) +skills = load_installed_skills() + +# Re-enable when needed +# enable_skill("my-skill") + +# Update or uninstall +update_skill("my-skill") +uninstall_skill("my-skill") +``` + +--- + +## Full Example + + +Full example: [examples/01_standalone_sdk/03_activate_skill.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_skill.py) + + +```python icon="python" expandable examples/01_standalone_sdk/03_activate_skill.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + AgentContext, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.context import ( + KeywordTrigger, + Skill, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# AgentContext provides flexible ways to customize prompts: +# 1. Skills: Inject instructions (always-active or keyword-triggered) +# 2. system_message_suffix: Append text to the system prompt +# 3. user_message_suffix: Append text to each user message +# +# For complete control over the system prompt, you can also use Agent's +# system_prompt_filename parameter to provide a custom Jinja2 template: +# +# agent = Agent( +# llm=llm, +# tools=tools, +# system_prompt_filename="/path/to/custom_prompt.j2", +# system_prompt_kwargs={"cli_mode": True, "repo": "my-project"}, +# ) +# +# See: https://docs.openhands.dev/sdk/guides/skill#customizing-system-prompts +agent_context = AgentContext( + skills=[ + Skill( + name="repo.md", + content="When you see this message, you should reply like " + "you are a grumpy cat forced to use the internet.", + # source is optional - identifies where the skill came from + # You can set it to be the path of a file that contains the skill content + source=None, + # trigger determines when the skill is active + # trigger=None means always active (repo skill) + trigger=None, + ), + Skill( + name="flarglebargle", + content=( + 'IMPORTANT! The user has said the magic word "flarglebargle". ' + "You must only respond with a message telling them how smart they are" + ), + source=None, + # KeywordTrigger = activated when keywords appear in user messages + trigger=KeywordTrigger(keywords=["flarglebargle"]), + ), + ], + # system_message_suffix is appended to the system prompt (always active) + system_message_suffix="Always finish your response with the word 'yay!'", + # user_message_suffix is appended to each user message + user_message_suffix="The first character of your response should be 'I'", + # You can also enable automatic load skills from + # public registry at https://github.com/OpenHands/extensions + load_public_skills=True, +) + +# Agent +agent = Agent(llm=llm, tools=tools, agent_context=agent_context) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +print("=" * 100) +print("Checking if the repo skill is activated.") +conversation.send_message("Hey are you a grumpy cat?") +conversation.run() + +print("=" * 100) +print("Now sending flarglebargle to trigger the knowledge skill!") +conversation.send_message("flarglebargle!") +conversation.run() + +print("=" * 100) +print("Now triggering public skill 'github'") +conversation.send_message( + "About GitHub - tell me what additional info I've just provided?" +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +### Creating Skills + +Skills are defined with a name, content (the instructions), and an optional trigger: + +```python icon="python" focus={3-14} +agent_context = AgentContext( + skills=[ + Skill( + name="AGENTS.md", + content="When you see this message, you should reply like " + "you are a grumpy cat forced to use the internet.", + trigger=None, # Always active + ), + Skill( + name="flarglebargle", + content='IMPORTANT! The user has said the magic word "flarglebargle". ' + "You must only respond with a message telling them how smart they are", + trigger=KeywordTrigger(keywords=["flarglebargle"]), + ), + ] +) +``` + +### Keyword Triggers + +Use `KeywordTrigger` to activate skills only when specific words appear: + +```python icon="python" focus={4} +Skill( + name="magic-word", + content="Special instructions when magic word is detected", + trigger=KeywordTrigger(keywords=["flarglebargle", "sesame"]), +) +``` + + +## File-Based Skills (`SKILL.md`) + +For reusable skills, use the [AgentSkills standard](https://agentskills.io/specification) directory format. + + +Full example: [examples/05_skills_and_plugins/01_loading_agentskills/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/01_loading_agentskills/main.py) + + +### Directory Structure + +Each skill is a directory containing: + + + + + + + + + + + + + + + + +where + +| Component | Required | Description | +|-------|----------|-------------| +| `SKILL.md` | Yes | Skill definition with frontmatter | +| `scripts/` | No | Executable scripts | +| `references/` | No | Reference documentation | +| `assets/` | No | Static assets | + + + +### `SKILL.md` Format + +The `SKILL.md` file defines the skill with YAML frontmatter: + +```md icon="markdown" +--- +name: my-skill # Required (standard) +description: > # Required (standard) + A brief description of what this skill does and when to use it. +license: MIT # Optional (standard) +compatibility: Requires bash # Optional (standard) +metadata: # Optional (standard) + author: your-name + version: "1.0" +triggers: # Optional (OpenHands extension) + - keyword1 + - keyword2 +--- + +# Skill Content + +Instructions and documentation for the agent... +``` + +#### Frontmatter Fields + +| Field | Required | Description | +|-------|----------|-------------| +| `name` | Yes | Skill identifier (lowercase + hyphens) | +| `description` | Yes | What the skill does (shown to agent) | +| `triggers` | No | Keywords that auto-activate this skill (**OpenHands extension**) | +| `license` | No | License name | +| `compatibility` | No | Environment requirements | +| `metadata` | No | Custom key-value pairs | + + +Add `triggers` to make your SKILL.md keyword-activated by matching a user prompt. Without triggers, the skill can only be triggered by the agent, not the user. + + +### Loading Skills + +Use `load_skills_from_dir()` to load all skills from a directory: + +```python icon="python" expandable examples/05_skills_and_plugins/01_loading_agentskills/main.py +"""Example: Loading Skills from Disk (AgentSkills Standard) + +This example demonstrates how to load skills following the AgentSkills standard +from a directory on disk. + +Skills are modular, self-contained packages that extend an agent's capabilities +by providing specialized knowledge, workflows, and tools. They follow the +AgentSkills standard which includes: +- SKILL.md file with frontmatter metadata (name, description, triggers) +- Optional resource directories: scripts/, references/, assets/ + +The example_skills/ directory contains two skills: +- rot13-encryption: Has triggers (encrypt, decrypt) - listed in + AND content auto-injected when triggered +- code-style-guide: No triggers - listed in for on-demand access + +All SKILL.md files follow the AgentSkills progressive disclosure model: +they are listed in with name, description, and location. +Skills with triggers get the best of both worlds: automatic content injection +when triggered, plus the agent can proactively read them anytime. +""" + +import os +import sys +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, AgentContext, Conversation +from openhands.sdk.context.skills import ( + discover_skill_resources, + load_skills_from_dir, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Get the directory containing this script +script_dir = Path(__file__).parent +example_skills_dir = script_dir / "example_skills" + +# ========================================================================= +# Part 1: Loading Skills from a Directory +# ========================================================================= +print("=" * 80) +print("Part 1: Loading Skills from a Directory") +print("=" * 80) + +print(f"Loading skills from: {example_skills_dir}") + +# Discover resources in the skill directory +skill_subdir = example_skills_dir / "rot13-encryption" +resources = discover_skill_resources(skill_subdir) +print("\nDiscovered resources in rot13-encryption/:") +print(f" - scripts: {resources.scripts}") +print(f" - references: {resources.references}") +print(f" - assets: {resources.assets}") + +# Load skills from the directory +repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(example_skills_dir) + +print("\nLoaded skills from directory:") +print(f" - Repo skills: {list(repo_skills.keys())}") +print(f" - Knowledge skills: {list(knowledge_skills.keys())}") +print(f" - Agent skills (SKILL.md): {list(agent_skills.keys())}") + +# Access the loaded skill and show all AgentSkills standard fields +if agent_skills: + skill_name = next(iter(agent_skills)) + loaded_skill = agent_skills[skill_name] + print(f"\nDetails for '{skill_name}' (AgentSkills standard fields):") + print(f" - Name: {loaded_skill.name}") + desc = loaded_skill.description or "" + print(f" - Description: {desc[:70]}...") + print(f" - License: {loaded_skill.license}") + print(f" - Compatibility: {loaded_skill.compatibility}") + print(f" - Metadata: {loaded_skill.metadata}") + if loaded_skill.resources: + print(" - Resources:") + print(f" - Scripts: {loaded_skill.resources.scripts}") + print(f" - References: {loaded_skill.resources.references}") + print(f" - Assets: {loaded_skill.resources.assets}") + print(f" - Skill root: {loaded_skill.resources.skill_root}") + +# ========================================================================= +# Part 2: Using Skills with an Agent +# ========================================================================= +print("\n" + "=" * 80) +print("Part 2: Using Skills with an Agent") +print("=" * 80) + +# Check for API key +api_key = os.getenv("LLM_API_KEY") +if not api_key: + print("Skipping agent demo (LLM_API_KEY not set)") + print("\nTo run the full demo, set the LLM_API_KEY environment variable:") + print(" export LLM_API_KEY=your-api-key") + sys.exit(0) + +# Configure LLM +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +llm = LLM( + usage_id="skills-demo", + model=model, + api_key=SecretStr(api_key), + base_url=os.getenv("LLM_BASE_URL"), +) + +# Create agent context with loaded skills +agent_context = AgentContext( + skills=list(agent_skills.values()), + # Disable public skills for this demo to keep output focused + load_public_skills=False, +) + +# Create agent with tools so it can read skill resources +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] +agent = Agent(llm=llm, tools=tools, agent_context=agent_context) + +# Create conversation +conversation = Conversation(agent=agent, workspace=os.getcwd()) + +# Test the skill (triggered by "encrypt" keyword) +# The skill provides instructions and a script for ROT13 encryption +print("\nSending message with 'encrypt' keyword to trigger skill...") +conversation.send_message("Encrypt the message 'hello world'.") +conversation.run() + +print(f"\nTotal cost: ${llm.metrics.accumulated_cost:.4f}") +print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") +``` + + + + +### Key Functions + +#### `load_skills_from_dir()` + +Loads all skills from a directory, returning three dictionaries: + +```python icon="python" focus={3} +from openhands.sdk.context.skills import load_skills_from_dir + +repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(skills_dir) +``` + +- **repo_skills**: Skills from `repo.md` files (always active) +- **knowledge_skills**: Skills from `knowledge/` subdirectories +- **agent_skills**: Skills from `SKILL.md` files (AgentSkills standard) + +#### `discover_skill_resources()` + +Discovers resource files in a skill directory: + +```python icon="python" focus={3} +from openhands.sdk.context.skills import discover_skill_resources + +resources = discover_skill_resources(skill_dir) +print(resources.scripts) # List of script files +print(resources.references) # List of reference files +print(resources.assets) # List of asset files +print(resources.skill_root) # Path to skill directory +``` + +### Skill Location in Prompts + +The `` element in `` follows the AgentSkills standard, allowing agents to read the full skill content on demand. When a triggered skill is activated, the content is injected with the location path: + +``` + +The following information has been included based on a keyword match for "encrypt". + +Skill location: /path/to/rot13-encryption +(Use this path to resolve relative file references in the skill content below) + +[skill content from SKILL.md] + +``` + +This enables skills to reference their own scripts and resources using relative paths like `./scripts/encrypt.sh`. + +### Example Skill: ROT13 Encryption + +Here's a skill with triggers (OpenHands extension): + +**SKILL.md:** +```markdown icon="markdown" +--- +name: rot13-encryption +description: > + This skill helps encrypt and decrypt messages using ROT13 cipher. +triggers: + - encrypt + - decrypt + - cipher +--- + +# ROT13 Encryption Skill + +Run the [encrypt.sh](scripts/encrypt.sh) script with your message: + +\`\`\`bash +./scripts/encrypt.sh "your message" +\`\`\` +``` + +**scripts/encrypt.sh:** +```bash icon="sh" +#!/bin/bash +echo "$1" | tr 'A-Za-z' 'N-ZA-Mn-za-m' +``` + +When the user says "encrypt", the skill is triggered and the agent can use the provided script. + +## Loading Public Skills + +OpenHands maintains a [public skills repository](https://github.com/OpenHands/extensions) with community-contributed skills. You can automatically load these skills without waiting for SDK updates. + +### Automatic Loading via AgentContext + +Enable public skills loading in your `AgentContext`: + +```python icon="python" focus={2} +agent_context = AgentContext( + load_public_skills=True, # Auto-load from public registry + skills=[ + # Your custom skills here + ] +) +``` + +When enabled, the SDK will: +1. Clone or update the public skills repository to `~/.openhands/cache/skills/` on first run +2. Load all available skills from the repository +3. Merge them with your explicitly defined skills + +### Skill Naming and Triggers + +**Skill Precedence by Name**: If a skill name conflicts, your explicitly defined skills take precedence over public skills. For example, if you define a skill named `code-review`, the public `code-review` skill will be skipped entirely. + +**Multiple Skills with Same Trigger**: Skills with different names but the same trigger can coexist and will ALL be activated when the trigger matches. To add project-specific guidelines alongside public skills, use a unique name (e.g., `custom-codereview-guide` instead of `code-review`). Both skills will be triggered together. + +```python icon="python" +# Both skills will be triggered by "/codereview" +agent_context = AgentContext( + load_public_skills=True, # Loads public "code-review" skill + skills=[ + Skill( + name="custom-codereview-guide", # Different name = coexists + content="Project-specific guidelines...", + trigger=KeywordTrigger(keywords=["/codereview"]), + ), + ] +) +``` + + +**Skill Activation Behavior**: When multiple skills share a trigger, all matching skills are loaded. Content is concatenated into the agent's context with public skills first, then explicitly defined skills. There is no smart merging—if guidelines conflict, the agent sees both. + + +### Programmatic Loading + +You can also load public skills manually and have more control: + +```python icon="python" +from openhands.sdk.context.skills import load_public_skills + +# Load all public skills +public_skills = load_public_skills() + +# Use with AgentContext +agent_context = AgentContext(skills=public_skills) + +# Or combine with custom skills +my_skills = [ + Skill(name="custom", content="Custom instructions", trigger=None) +] +agent_context = AgentContext(skills=my_skills + public_skills) +``` + +### Custom Skills Repository + +You can load skills from your own repository: + +```python icon="python" focus={3-7} +from openhands.sdk.context.skills import load_public_skills + +# Load from a custom repository +custom_skills = load_public_skills( + repo_url="https://github.com/my-org/my-skills", + branch="main" +) +``` + +### How It Works + +The `load_public_skills()` function uses git-based caching for efficiency: + +- **First run**: Clones the skills repository to `~/.openhands/cache/skills/public-skills/` +- **Subsequent runs**: Pulls the latest changes to keep skills up-to-date +- **Offline mode**: Uses the cached version if network is unavailable + +This approach is more efficient than fetching individual skill files via HTTP and ensures you always have access to the latest community skills. + + +Explore available public skills at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). These skills cover various domains like GitHub integration, Python development, debugging, and more. + + +## Customizing Agent Context + +### Message Suffixes + +Append custom instructions to the system prompt or user messages via `AgentContext`: + +```python icon="python" +agent_context = AgentContext( + system_message_suffix=""" + +Repository: my-project +Branch: feature/new-api + + """.strip(), + user_message_suffix="Remember to explain your reasoning." +) +``` + +- **`system_message_suffix`**: Appended to system prompt (always active, combined with repo skills) +- **`user_message_suffix`**: Appended to each user message + +### Replacing the Entire System Prompt + +For complete control, provide a custom Jinja2 template via the `Agent` class: + +```python icon="python" focus={6} +from openhands.sdk import Agent + +agent = Agent( + llm=llm, + tools=tools, + system_prompt_filename="/path/to/custom_system_prompt.j2", # Absolute path + system_prompt_kwargs={"cli_mode": True, "repo_name": "my-project"} +) +``` + +**Custom template example** (`custom_system_prompt.j2`): + +```jinja2 +You are a helpful coding assistant for {{ repo_name }}. + +{% if cli_mode %} +You are running in CLI mode. Keep responses concise. +{% endif %} + +Follow these guidelines: +- Write clean, well-documented code +- Consider edge cases and error handling +- Suggest tests when appropriate +``` + +**Key points:** +- Use relative filenames (e.g., `"system_prompt.j2"`) to load from the agent's prompts directory +- Use absolute paths (e.g., `"/path/to/prompt.j2"`) to load from any location +- Pass variables to the template via `system_prompt_kwargs` +- The `system_message_suffix` from `AgentContext` is automatically appended after your custom prompt + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools +- **[MCP Integration](/sdk/guides/mcp)** - Connect external tool servers +- **[Confirmation Mode](/sdk/guides/security)** - Add execution approval + +### Task Tool Set +Source: https://docs.openhands.dev/sdk/guides/task-tool-set.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## Overview + +The TaskToolSet lets a parent agent launch sub-agents that handle complex, multi-step tasks autonomously. Each sub-agent runs **synchronously** — the parent blocks until the sub-agent finishes and returns its result. Sub-agents can be **resumed** later using a task ID, preserving their full conversation context. + +This pattern is useful when: +- Delegating specialized work to purpose-built sub-agents +- Breaking a problem into sequential steps handled by different experts +- Maintaining conversational context across multiple interactions with a sub-agent +- Isolating sub-task complexity from the parent agent's context + + +For **parallel** sub-agent execution, see [Sub-Agent Delegation](/sdk/guides/agent-delegation). TaskToolSet is designed for **sequential** blocking tasks. + + +## How It Works + +The agent calls the task tool with a prompt and a sub-agent type. The TaskManager creates (or resumes) a sub-agent conversation, runs it to completion, and returns the result to the parent. + +```mermaid +sequenceDiagram + participant P as Parent Agent + participant T as TaskManager + participant S as Sub-Agent + + P->>T: task(prompt, type) + activate T + T->>S: create / resume + activate S + Note over S: runs autonomously + S->>T: result + deactivate S + T->>P: TaskObservation + deactivate T + Note right of T: persists for resume +``` + +### Task Lifecycle + +1. **Creation**: A fresh sub-agent and conversation are created +2. **Running**: The sub-agent processes the prompt autonomously +3. **Completion**: The final response is extracted and returned +4. **Persistence**: The conversation is saved to disk for potential resumption +5. **Resumption** (optional): A previous task can be resumed with its full context preserved + +## Setting Up the TaskToolSet + + + + ### Register Custom Sub-Agent Types (Optional) + + By default, a `"default"` general-purpose agent is available, but you can register your own custom types + for specialized behavior: + + ```python icon="python" focus={23-27} + from openhands.sdk import LLM, Agent, AgentContext + from openhands.sdk.context import Skill + from openhands.tools.delegate import register_agent + + def create_code_reviewer(llm: LLM) -> Agent: + return Agent( + llm=llm, + tools=[], + agent_context=AgentContext( + skills=[ + Skill( + name="code_review", + content="""You are an expert code reviewer. + Analyze code for bugs, style issues, + and suggest improvements. + """, + trigger=None, + ) + ], + ), + ) + + register_agent( + name="code_reviewer", + factory_func=create_code_reviewer, + description="Reviews code for bugs, style issues, and improvements.", + ) + ``` + + + ### Add TaskToolSet to the Agent + + ```python icon="python" focus={6} + from openhands.sdk import Agent, Tool + from openhands.tools.task import TaskToolSet + + agent = Agent( + llm=llm, + tools=[Tool(name=TaskToolSet.name)], + ) + ``` + + The tool auto-registers on import — no explicit `register_tool()` call is needed. + + + ### Create a Conversation + + ```python icon="python" focus={5-9} + from openhands.sdk import Conversation + from openhands.tools.delegate import DelegationVisualizer + from pathlib import Path + + conversation = Conversation( + agent=agent, + workspace=Path.cwd(), + visualizer=DelegationVisualizer(name="Orchestrator"), + ) + ``` + + + The `DelegationVisualizer` is optional but recommended — it shows the multi-agent conversation flow in the terminal. + + + + +## Tool Parameters + +When the parent agent calls the task tool, it provides these parameters: + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `prompt` | `str` | Yes | The instruction for the sub-agent | +| `subagent_type` | `str` | No | Which registered agent type to use (default: `"default"`) | +| `description` | `str` | No | Short label (3-5 words) for display and tracking | +| `resume` | `str` | No | Task ID from a previous invocation to continue | +| `max_turns` | `int` | No | Maximum agent iterations before stopping (default: 500) | + +## Task Observation + +The tool returns a `TaskObservation` containing: + +| Field | Description | +|-------|-------------| +| `task_id` | Unique identifier (e.g., `task_00000001`) — use this for resumption | +| `subagent` | The agent type that handled the task | +| `status` | Final status: `completed` or `error` | +| `text` | The sub-agent's response (or error message) | + +## Resuming Tasks + +A key feature of TaskToolSet is the ability to resume a previously completed task. When a task finishes, its conversation is persisted to disk. Passing the `resume` parameter with the task ID reloads the full conversation history, allowing the sub-agent to continue where it left off. + +```python icon="python" +# First call — sub-agent generates a quiz question +conversation.send_message( + "Use the task tool with subagent_type='quiz_expert' to generate " + "a multiple-choice question about zebras." +) +conversation.run() +# The agent receives task_id "task_00000001" in the observation + +# Second call — resume the same sub-agent to verify the answer +conversation.send_message( + "The user answered A. Use the task tool with resume='task_00000001' " + "to ask the same sub-agent whether that answer is correct." +) +conversation.run() +``` + +## TaskToolSet vs DelegateTool + +| | TaskToolSet | DelegateTool | +|---|---|---| +| **Execution** | Sequential (blocking) | Parallel (concurrent) | +| **Concurrency** | One task at a time | Multiple sub-agents simultaneously | +| **Resumption** | Built-in via `resume` parameter | Persistent sub-agents by ID | +| **API** | Single `task` tool call | `spawn` + `delegate` commands | +| **Best for** | Expert delegation, multi-turn workflows | Fan-out / fan-in parallelism | + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/41_task_tool_set.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/41_task_tool_set.py) + + +```python icon="python" expandable examples/01_standalone_sdk/40_task_tool_set.py +""" +Animal Quiz with Task Tool Set + +Demonstrates the TaskToolSet with a main agent delegating to an +animal-expert sub-agent. The flow is: + +1. User names an animal. +2. Main agent delegates to the "animal_expert" sub-agent to generate + a multiple-choice question about that animal. +3. Main agent shows the question to the user. +4. User picks an answer. +5. Main agent resumes the same sub-agent to check whether the answer + is correct and explain why. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, AgentContext, Conversation, Tool +from openhands.sdk.context import Skill +from openhands.tools.delegate import DelegationVisualizer, register_agent +from openhands.tools.task import TaskToolSet + + +# ── LLM setup ──────────────────────────────────────────────────────── + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=SecretStr(api_key), + base_url=os.getenv("LLM_BASE_URL", None), +) + +# ── Register the animal expert sub-agent ───────────────────────────── + + +def create_animal_expert(llm: LLM) -> Agent: + """Factory for the animal-expert sub-agent.""" + return Agent( + llm=llm, + tools=[], # no tools needed – pure knowledge + agent_context=AgentContext( + skills=[ + Skill( + name="animal_expertise", + content=( + "You are a world-class zoologist. " + "When asked to generate a quiz question, respond with " + "EXACTLY this format and nothing else:\n\n" + "Question: \n" + "A)