Our SDK allows you to use your own model cloud or models from a 3rd party provider to work with models that are not available through the AskUI API.
Our SDK follows a layered architecture: Provider → Model → MessagesAPI → Client
- Provider: User-facing configuration layer (
AskUIVlmProvider,AnthropicVlmProvider, etc.) - Model: Internal implementation handling the agent loop (e.g.,
AnthropicActModel) - MessagesAPI: Converts internal message format to provider-specific format
- Client: Underlying HTTP client (
Anthropic,OpenAI, etc.)
Users typically configure only the Provider. For advanced use cases and full customization, you can also inject a pre-configured Client.
Our SDK supports the following three provider slots:
| Slot | Used by | Default |
|---|---|---|
vlm_provider |
act() — multimodal input + tool-calling |
AskUIVlmProvider |
image_qa_provider |
get() — Q&A and structured extraction |
AskUIImageQAProvider |
detection_provider |
locate(), click(), etc. — element coordinates |
AskUIDetectionProvider |
Per default, Agents use the following providers to access models from the AskUI API:
AskuiVlmProviderforact()AskUIImageQAProviderforget()AskUIDetectionProviderforlocate()
from askui import ComputerAgent
with ComputerAgent() as agent:
agent.act("Open the calculator")
result = agent.get("What is shown on the display?")
pos = agent.locate("7")Environment variables:
ASKUI_WORKSPACE_ID(required)ASKUI_TOKEN(required)
If you want to use Anthropic models directly from the Anthropic API with your Anthropic API key, you can use:
AnthropicVlmProviderforact()AnthropicImageQAProviderforget()
import os
from askui import AgentSettings, ComputerAgent
from askui.model_providers import AnthropicVlmProvider
with ComputerAgent(settings=AgentSettings(
vlm_provider=AnthropicVlmProvider(
model_id="claude-opus-4-6",
),
)) as agent:
agent.act("Navigate to settings")
agent.get("What is shown on the display")Environment variables:
ANTHROPIC_API_KEY(optional)ANTHROPIC_AUTH_TOKEN(optional)ANTHROPIC_BASE_URL(optional, default=https://api.anthropic.com)
Note: either ANTHROPIC_API_KEY or ANTHROPIC_AUTH_TOKEN must be set
If you want to use Google models directly from the Google API with your Google API key, you can use:
GoogleImageQAProviderforget()
from askui import AgentSettings, ComputerAgent
from askui.model_providers import GoogleImageQAProvider
with ComputerAgent(settings=AgentSettings(
image_qa_provider=GoogleImageQAProvider(
model_id="gemini-2.5-pro",
),
)) as agent:
agent.get("What is shown on the display")Environment variables:
GOOGLE_API_KEY(required)
or, if you want to use the Vertex AI API, please set:
GOOGLE_GENAI_USE_VERTEXAI=true(required)GOOGLE_CLOUD_PROJECT(required)GOOGLE_CLOUD_LOCATION(required)
For customization and to access to use your own model cloud, you can also implement your own model provider class. The library defines three base classes, that you can overwrite for that.
| Base Class | Method to overwrite | Used by |
|---|---|---|
VlmProvider |
create_message(...) |
act() |
ImageQAProvider |
query(...) |
get() |
DetectionProvider |
detect(...) |
locate() |
from typing import Type
from typing_extensions import override
from askui.model_providers import ImageQAProvider
from askui.models.shared.settings import GetSettings
from askui.models.types.response_schemas import ResponseSchema
from askui.utils.source_utils import Source
class MyImageQAProvider(ImageQAProvider):
@property
def model_id(self) -> str:
return "my-model-v1"
@override
def query(
self,
query: str,
source: Source,
response_schema: Type[ResponseSchema] | None,
get_settings: GetSettings,
) -> ResponseSchema | str:
return "my answer" # call your API hereThe built-in VLM providers include default pricing for supported models. You can override the pricing on any provider by passing input_cost_per_million_tokens and output_cost_per_million_tokens:
from askui import AgentSettings, ComputerAgent
from askui.model_providers import AnthropicVlmProvider
from askui.reporting import SimpleHtmlReporter
with ComputerAgent(
reporters=[SimpleHtmlReporter()],
settings=AgentSettings(
vlm_provider=AnthropicVlmProvider(
model_id="claude-sonnet-4-6",
input_cost_per_million_tokens=3.0,
output_cost_per_million_tokens=15.0,
),
),
) as agent:
agent.act("Open settings")If you implement a fully custom VlmProvider, override the pricing property to enable cost tracking:
from askui.model_providers import VlmProvider, ModelPricing
class MyVlmProvider(VlmProvider):
@property
def pricing(self) -> ModelPricing | None:
return ModelPricing(
input_cost_per_million_tokens=1.0,
output_cost_per_million_tokens=5.0,
)
# ... rest of implementationFor full control over HTTP settings (timeouts, proxies, retries), you can inject a pre-configured client:
import os
from anthropic import Anthropic
from askui import AgentSettings, ComputerAgent
from askui.model_providers import AnthropicVlmProvider
client = Anthropic(
timeout=60.0,
max_retries=3,
)
with ComputerAgent(settings=AgentSettings(
vlm_provider=AnthropicVlmProvider(client=client),
)) as agent:
agent.act("Process the document")