WebDriver CLI for Tauri apps via tauri-driver. Mirrors the agent-browser API using WebDriver protocol (required for Tauri's WebKitGTK webview on Linux).
Tauri on Linux uses WebKitGTK, which exposes a WebDriver interface -- not CDP (Chrome DevTools Protocol). agent-browser uses CDP and cannot connect to WebKitGTK. tauri-browse speaks WebDriver to tauri-driver, which wraps WebKitWebDriver, giving you full Tauri IPC access (window.__TAURI_INTERNALS__).
pipx install tauri-browseOr from source:
pip install .After installation, tauri-browse is available as a system-wide command.
# Debian/Ubuntu
sudo apt install webkit2gtk-driver xvfb imagemagick
# Cargo
cargo install tauri-driver --lockedtauri-browse requires tauri-driver (WebDriver server) and an X display. For headless CI or development, use Xvfb. Here's a minimal dev script you can adapt for any Tauri project:
#!/usr/bin/env bash
set -euo pipefail
APP_DIR="." # your Tauri project root
# Build the Tauri app
cargo build --manifest-path "$APP_DIR/src-tauri/Cargo.toml"
# Start Xvfb (headless X server) -- match your Tauri window dimensions
Xvfb :99 -screen 0 1400x900x24 -nolisten tcp &
XVFB_PID=$!
sleep 0.5
export DISPLAY=:99
unset WAYLAND_DISPLAY # avoid Wayland conflicts (e.g. WSLg)
export GDK_BACKEND=x11
# Start tauri-driver (WebDriver server on port 4444)
tauri-driver &
TAURI_DRIVER_PID=$!
trap "kill $TAURI_DRIVER_PID $XVFB_PID 2>/dev/null" EXIT
# Start the Vite dev server (frontend hot reload)
pnpm --dir "$APP_DIR" devSave this as e.g. dev-headless.sh, then in a separate terminal use tauri-browse to interact with the app. tauri-browse auto-detects the Xvfb display, so no DISPLAY setup is needed on the client side.
If your Tauri app uses beforeDevCommand in tauri.conf.json (the default), pnpm tauri dev handles both the frontend dev server and the Rust build together. Replace the last two commands with pnpm tauri dev in that case.
# Launch your Tauri app
tauri-browse launch path/to/your-tauri-app
# Take a screenshot to see the current state
tauri-browse screenshot /tmp/screenshot.png
# Get interactive elements with refs
tauri-browse snapshot -i
# Output: @e1 button "Add Project", @e2 input "name", ...
# Interact using refs
tauri-browse click @e1
tauri-browse fill @e2 "my-project"
tauri-browse press Enter
# Re-snapshot after DOM changes (refs are invalidated)
tauri-browse snapshot -i
# Close when done
tauri-browse closetauri-browse [options] <command> [args]
| Option | Description | Default |
|---|---|---|
--session <name> |
Session name | default |
--driver <url> |
WebDriver URL | http://localhost:4444 |
--display <display> |
X display for screenshots | auto-detected from Xvfb |
--config <path> |
Explicit config file path | |
--json |
Default to JSON output for snapshots | false |
--full |
Default to full page screenshots | false |
--annotate |
Default to annotated screenshots | false |
--debug |
Verbose output | false |
--timeout <secs> |
Request timeout in seconds | 10 |
--download-path <p> |
Default download directory |
Boolean flags accept an optional true/false value (e.g. --json false to override a config default). Bare flags default to true.
tauri-browse supports layered configuration:
| Location | Scope |
|---|---|
~/.tauri-browse/config.json |
User defaults (all projects) |
./tauri-browse.json |
Project overrides (current directory) |
Priority (lowest to highest): user config < project config < env vars < CLI flags.
Use --config <path> or TAURI_BROWSE_CONFIG env var to load a specific config file instead (skips user/project config loading).
Config keys use camelCase:
{
"driver": "http://localhost:4444",
"display": ":99",
"session": "default",
"json": false,
"full": false,
"annotate": false,
"debug": false,
"timeout": 10,
"downloadPath": "/tmp/downloads"
}All CLI options have corresponding environment variables:
| Variable | Description |
|---|---|
TAURI_BROWSE_CONFIG |
Explicit config file path |
TAURI_BROWSE_DRIVER |
WebDriver URL |
TAURI_BROWSE_DISPLAY |
X display for screenshots |
TAURI_BROWSE_SESSION |
Default session name |
TAURI_BROWSE_JSON |
Default to JSON output (true/false) |
TAURI_BROWSE_FULL |
Default to full screenshots (true/false) |
TAURI_BROWSE_ANNOTATE |
Default to annotated screenshots (true/false) |
TAURI_BROWSE_DEBUG |
Verbose output (true/false) |
TAURI_BROWSE_TIMEOUT |
Request timeout in seconds |
TAURI_BROWSE_DOWNLOAD_PATH |
Default download directory |
tauri-browse automatically detects a running Xvfb process and uses its display for screenshots. The resolution order for display is:
--displayCLI flagTAURI_BROWSE_DISPLAYenv vardisplayin config file- Running Xvfb process (auto-detected via
pgrep) DISPLAYenv var
This means you typically don't need to set DISPLAY at all -- just start Xvfb and tauri-browse finds it.
tauri-browse launch <binary> # Launch Tauri app via WebDriver
tauri-browse open <url> # Navigate to URL (aliases: goto, navigate)
tauri-browse close # Close session
tauri-browse back # Navigate back in history
tauri-browse forward # Navigate forward in history
tauri-browse reload # Reload current pageGet a text representation of interactive elements on the page, each assigned a ref (@e1, @e2, ...) for interaction. Optimized for LLM-based automation.
tauri-browse snapshot -i # Interactive elements with refs
tauri-browse snapshot -i -C # Include cursor-interactive elements
tauri-browse snapshot -i -s "#app" # Scoped to CSS selector
tauri-browse snapshot -i --json # Output as JSONtauri-browse screenshot [path] # Capture X display
tauri-browse screenshot --annotate # Numbered badges on interactive elements
tauri-browse screenshot --full # Full page (scroll + stitch)The --annotate flag overlays numbered badges on interactive elements and prints a legend mapping [N] to @eN. Refs are cached, so you can interact with elements immediately after. Useful for multimodal AI models that need to reason about visual layout, unlabeled icon buttons, canvas elements, or visual state the text snapshot cannot capture.
Screenshots use ImageMagick's import command to capture the X display, because WebKitWebDriver's screenshot endpoint does not work reliably under Xvfb.
All interaction commands accept either a ref (@e1) or a CSS selector.
tauri-browse click @e1 # Click element
tauri-browse dblclick @e1 # Double-click element
tauri-browse hover @e1 # Hover over element (triggers CSS :hover)
tauri-browse focus @e1 # Focus element
tauri-browse drag @e1 @e2 # Drag source element to destination
tauri-browse fill @e2 "text" # Clear and type
tauri-browse type @e2 "more" # Type without clearing
tauri-browse select @e3 "option" # Select dropdown option
tauri-browse check @e4 # Check checkbox
tauri-browse uncheck @e4 # Uncheck checkbox (only if checked)
tauri-browse press Enter # Press key (Enter, Tab, Escape, etc.)
tauri-browse scroll down 300 # Scroll (up/down/left/right)
tauri-browse scrollintoview @e1 # Scroll element into view
tauri-browse highlight @e1 # Highlight element visuallytauri-browse upload @e1 /path/to/file # Upload file to input element
tauri-browse download @e1 # Click element and wait for download
tauri-browse download @e1 --path /tmp # Download to specific directory
tauri-browse download @e1 --timeout 60000 # Custom timeout in msFind elements by semantic properties and perform an action. For role, both explicit role="..." attributes and implicit roles (e.g. <button> for role=button) are matched.
tauri-browse find text "Sign In" click
tauri-browse find label "Email" fill "user@test.com"
tauri-browse find role button click
tauri-browse find role button click --name "Submit"
tauri-browse find testid "submit-btn" click
tauri-browse find placeholder "Search" type "query"
tauri-browse find alt "Logo" click
tauri-browse find title "Close" click
tauri-browse find first ".card" click
tauri-browse find last ".card" click
tauri-browse find nth 3 ".card" clickAvailable actions for find: click, dblclick, hover, focus, fill, type, check, uncheck, select, highlight, scrollintoview, upload, download.
tauri-browse eval "return document.title"
# Complex JS via stdin (avoids shell quoting issues)
tauri-browse eval --stdin <<'EOF'
return Array.from(document.querySelectorAll("a")).map(a => a.href)
EOFtauri-browse get text @e1 # Get element text
tauri-browse get html @e1 # Get element innerHTML
tauri-browse get html @e1 --outer # Get element outerHTML
tauri-browse get value @e1 # Get input/textarea value
tauri-browse get attr @e1 href # Get element attribute by name
tauri-browse get count ".card" # Count elements matching selector
tauri-browse get box @e1 # Get bounding rectangle as JSON
tauri-browse get styles @e1 # Get common computed styles
tauri-browse get styles @e1 color font-size # Get specific styles
tauri-browse get url # Get current URL
tauri-browse get title # Get page titleCheck element state. Prints true or false, with exit code 1 for false (enables shell conditionals).
tauri-browse is visible @e1 # Check computed display/visibility/opacity + dimensions
tauri-browse is enabled @e1 # Check .disabled and aria-disabled
tauri-browse is checked @e1 # Check .checked and aria-checkedtauri-browse wait @e1 # Wait for element to appear
tauri-browse wait 2000 # Wait milliseconds
tauri-browse wait --url "/dashboard" # Wait for URL to contain pattern
tauri-browse wait --text "Welcome" # Wait for text to appear on page
tauri-browse wait --load networkidle # Wait for network idle
tauri-browse wait --fn "document.readyState === 'complete'"All wait commands accept an optional timeout as the last argument (in ms, default 10000).
tauri-browse frame @e1 # Switch to iframe element
tauri-browse frame main # Switch back to main/top frametauri-browse dialog accept # Accept alert/confirm dialog
tauri-browse dialog accept "input" # Accept prompt with text
tauri-browse dialog dismiss # Dismiss dialog
tauri-browse dialog text # Get dialog textConsole output is captured automatically on launch and open. Patches console.log/warn/error/info/debug, window.onerror, and unhandled promise rejections.
tauri-browse console # Show all console entries
tauri-browse console --level warn # Filter by level
tauri-browse console --clear # Show entries and clear buffer
tauri-browse errors # Show JS errors
tauri-browse errors --clear # Show and clear errorsCompare page states for verification and testing.
# Compare current snapshot to last taken snapshot
tauri-browse diff snapshot
# Compare against a saved baseline file
tauri-browse diff snapshot --baseline before.txt
# Visual pixel diff against a baseline image
tauri-browse diff screenshot --baseline before.png
# Diff two URLs (text comparison)
tauri-browse diff url http://localhost:1420 http://localhost:1421
# Diff two URLs (screenshot comparison)
tauri-browse diff url http://localhost:1420 http://localhost:1421 --screenshot
# Scope diff to a specific element
tauri-browse diff url <url1> <url2> --selector "#main"Named sessions allow parallel automation of multiple Tauri instances.
# Use named sessions
tauri-browse --session app1 launch ./my-app
tauri-browse --session app2 launch ./my-app
# List active sessions
tauri-browse session list
# Each session has independent refs and state
tauri-browse --session app1 snapshot -i
tauri-browse --session app2 snapshot -iSave and restore cookies and localStorage across sessions.
tauri-browse state save auth.json # Save current state
tauri-browse state load auth.json # Restore state
tauri-browse state list # List saved state files
tauri-browse state show auth.json # Pretty-print state file contents
tauri-browse state rename old new # Rename state file
tauri-browse state clean # Remove empty/corrupt state files
tauri-browse state clear [name] # Clear saved stateCommands can be chained with && in a single shell invocation. The session persists between commands, so chaining is safe and efficient.
# Navigate, wait, and screenshot in one call
tauri-browse open https://example.com && tauri-browse wait --load networkidle && tauri-browse screenshot page.png
# Chain multiple interactions
tauri-browse fill @e1 "user@example.com" && tauri-browse fill @e2 "password" && tauri-browse click @e3Use && when you don't need to read the output of intermediate commands. Run commands separately when you need to parse output (e.g. snapshot to discover refs, then interact).
Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:
- Clicking links or buttons that navigate
- Form submissions
- Dynamic content loading (dropdowns, modals)
tauri-browse click @e5 # Navigates to new page
tauri-browse snapshot -i # MUST re-snapshot
tauri-browse click @e1 # Use new refstauri-browse communicates with tauri-driver over the WebDriver HTTP protocol. tauri-driver wraps WebKitWebDriver, which controls the WebKitGTK webview inside your Tauri app.
tauri-browse --(HTTP/JSON)--> tauri-driver --(WebDriver)--> WebKitWebDriver --> Tauri Webview
Since this drives the actual Tauri webview (not a regular browser), all Tauri IPC commands work -- you can invoke Rust commands, access sidecars, use the full Tauri API from JavaScript, etc.
MIT