Skip to content

nullbio/tauri-browse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tauri-browse

WebDriver CLI for Tauri apps via tauri-driver. Mirrors the agent-browser API using WebDriver protocol (required for Tauri's WebKitGTK webview on Linux).

Why not agent-browser?

Tauri on Linux uses WebKitGTK, which exposes a WebDriver interface -- not CDP (Chrome DevTools Protocol). agent-browser uses CDP and cannot connect to WebKitGTK. tauri-browse speaks WebDriver to tauri-driver, which wraps WebKitWebDriver, giving you full Tauri IPC access (window.__TAURI_INTERNALS__).

Installation

pipx install tauri-browse

Or from source:

pip install .

After installation, tauri-browse is available as a system-wide command.

System dependencies

# Debian/Ubuntu
sudo apt install webkit2gtk-driver xvfb imagemagick

# Cargo
cargo install tauri-driver --locked

Setting up headless testing

tauri-browse requires tauri-driver (WebDriver server) and an X display. For headless CI or development, use Xvfb. Here's a minimal dev script you can adapt for any Tauri project:

#!/usr/bin/env bash
set -euo pipefail

APP_DIR="."  # your Tauri project root

# Build the Tauri app
cargo build --manifest-path "$APP_DIR/src-tauri/Cargo.toml"

# Start Xvfb (headless X server) -- match your Tauri window dimensions
Xvfb :99 -screen 0 1400x900x24 -nolisten tcp &
XVFB_PID=$!
sleep 0.5

export DISPLAY=:99
unset WAYLAND_DISPLAY        # avoid Wayland conflicts (e.g. WSLg)
export GDK_BACKEND=x11

# Start tauri-driver (WebDriver server on port 4444)
tauri-driver &
TAURI_DRIVER_PID=$!

trap "kill $TAURI_DRIVER_PID $XVFB_PID 2>/dev/null" EXIT

# Start the Vite dev server (frontend hot reload)
pnpm --dir "$APP_DIR" dev

Save this as e.g. dev-headless.sh, then in a separate terminal use tauri-browse to interact with the app. tauri-browse auto-detects the Xvfb display, so no DISPLAY setup is needed on the client side.

If your Tauri app uses beforeDevCommand in tauri.conf.json (the default), pnpm tauri dev handles both the frontend dev server and the Rust build together. Replace the last two commands with pnpm tauri dev in that case.

Quick start

# Launch your Tauri app
tauri-browse launch path/to/your-tauri-app

# Take a screenshot to see the current state
tauri-browse screenshot /tmp/screenshot.png

# Get interactive elements with refs
tauri-browse snapshot -i
# Output: @e1 button "Add Project", @e2 input "name", ...

# Interact using refs
tauri-browse click @e1
tauri-browse fill @e2 "my-project"
tauri-browse press Enter

# Re-snapshot after DOM changes (refs are invalidated)
tauri-browse snapshot -i

# Close when done
tauri-browse close

Usage

tauri-browse [options] <command> [args]

Global options

Option Description Default
--session <name> Session name default
--driver <url> WebDriver URL http://localhost:4444
--display <display> X display for screenshots auto-detected from Xvfb
--config <path> Explicit config file path
--json Default to JSON output for snapshots false
--full Default to full page screenshots false
--annotate Default to annotated screenshots false
--debug Verbose output false
--timeout <secs> Request timeout in seconds 10
--download-path <p> Default download directory

Boolean flags accept an optional true/false value (e.g. --json false to override a config default). Bare flags default to true.

Configuration files

tauri-browse supports layered configuration:

Location Scope
~/.tauri-browse/config.json User defaults (all projects)
./tauri-browse.json Project overrides (current directory)

Priority (lowest to highest): user config < project config < env vars < CLI flags.

Use --config <path> or TAURI_BROWSE_CONFIG env var to load a specific config file instead (skips user/project config loading).

Config keys use camelCase:

{
  "driver": "http://localhost:4444",
  "display": ":99",
  "session": "default",
  "json": false,
  "full": false,
  "annotate": false,
  "debug": false,
  "timeout": 10,
  "downloadPath": "/tmp/downloads"
}

Environment variables

All CLI options have corresponding environment variables:

Variable Description
TAURI_BROWSE_CONFIG Explicit config file path
TAURI_BROWSE_DRIVER WebDriver URL
TAURI_BROWSE_DISPLAY X display for screenshots
TAURI_BROWSE_SESSION Default session name
TAURI_BROWSE_JSON Default to JSON output (true/false)
TAURI_BROWSE_FULL Default to full screenshots (true/false)
TAURI_BROWSE_ANNOTATE Default to annotated screenshots (true/false)
TAURI_BROWSE_DEBUG Verbose output (true/false)
TAURI_BROWSE_TIMEOUT Request timeout in seconds
TAURI_BROWSE_DOWNLOAD_PATH Default download directory

Display auto-detection

tauri-browse automatically detects a running Xvfb process and uses its display for screenshots. The resolution order for display is:

  1. --display CLI flag
  2. TAURI_BROWSE_DISPLAY env var
  3. display in config file
  4. Running Xvfb process (auto-detected via pgrep)
  5. DISPLAY env var

This means you typically don't need to set DISPLAY at all -- just start Xvfb and tauri-browse finds it.

Commands

Navigation

tauri-browse launch <binary>        # Launch Tauri app via WebDriver
tauri-browse open <url>             # Navigate to URL (aliases: goto, navigate)
tauri-browse close                  # Close session
tauri-browse back                   # Navigate back in history
tauri-browse forward                # Navigate forward in history
tauri-browse reload                 # Reload current page

Snapshots

Get a text representation of interactive elements on the page, each assigned a ref (@e1, @e2, ...) for interaction. Optimized for LLM-based automation.

tauri-browse snapshot -i            # Interactive elements with refs
tauri-browse snapshot -i -C         # Include cursor-interactive elements
tauri-browse snapshot -i -s "#app"  # Scoped to CSS selector
tauri-browse snapshot -i --json     # Output as JSON

Screenshots

tauri-browse screenshot [path]          # Capture X display
tauri-browse screenshot --annotate      # Numbered badges on interactive elements
tauri-browse screenshot --full          # Full page (scroll + stitch)

The --annotate flag overlays numbered badges on interactive elements and prints a legend mapping [N] to @eN. Refs are cached, so you can interact with elements immediately after. Useful for multimodal AI models that need to reason about visual layout, unlabeled icon buttons, canvas elements, or visual state the text snapshot cannot capture.

Screenshots use ImageMagick's import command to capture the X display, because WebKitWebDriver's screenshot endpoint does not work reliably under Xvfb.

Interaction

All interaction commands accept either a ref (@e1) or a CSS selector.

tauri-browse click @e1              # Click element
tauri-browse dblclick @e1           # Double-click element
tauri-browse hover @e1              # Hover over element (triggers CSS :hover)
tauri-browse focus @e1              # Focus element
tauri-browse drag @e1 @e2           # Drag source element to destination
tauri-browse fill @e2 "text"        # Clear and type
tauri-browse type @e2 "more"        # Type without clearing
tauri-browse select @e3 "option"    # Select dropdown option
tauri-browse check @e4              # Check checkbox
tauri-browse uncheck @e4            # Uncheck checkbox (only if checked)
tauri-browse press Enter            # Press key (Enter, Tab, Escape, etc.)
tauri-browse scroll down 300        # Scroll (up/down/left/right)
tauri-browse scrollintoview @e1     # Scroll element into view
tauri-browse highlight @e1          # Highlight element visually

File handling

tauri-browse upload @e1 /path/to/file    # Upload file to input element
tauri-browse download @e1                # Click element and wait for download
tauri-browse download @e1 --path /tmp    # Download to specific directory
tauri-browse download @e1 --timeout 60000  # Custom timeout in ms

Semantic find

Find elements by semantic properties and perform an action. For role, both explicit role="..." attributes and implicit roles (e.g. <button> for role=button) are matched.

tauri-browse find text "Sign In" click
tauri-browse find label "Email" fill "user@test.com"
tauri-browse find role button click
tauri-browse find role button click --name "Submit"
tauri-browse find testid "submit-btn" click
tauri-browse find placeholder "Search" type "query"
tauri-browse find alt "Logo" click
tauri-browse find title "Close" click
tauri-browse find first ".card" click
tauri-browse find last ".card" click
tauri-browse find nth 3 ".card" click

Available actions for find: click, dblclick, hover, focus, fill, type, check, uncheck, select, highlight, scrollintoview, upload, download.

JavaScript evaluation

tauri-browse eval "return document.title"

# Complex JS via stdin (avoids shell quoting issues)
tauri-browse eval --stdin <<'EOF'
return Array.from(document.querySelectorAll("a")).map(a => a.href)
EOF

Get information

tauri-browse get text @e1           # Get element text
tauri-browse get html @e1           # Get element innerHTML
tauri-browse get html @e1 --outer   # Get element outerHTML
tauri-browse get value @e1          # Get input/textarea value
tauri-browse get attr @e1 href      # Get element attribute by name
tauri-browse get count ".card"      # Count elements matching selector
tauri-browse get box @e1            # Get bounding rectangle as JSON
tauri-browse get styles @e1         # Get common computed styles
tauri-browse get styles @e1 color font-size  # Get specific styles
tauri-browse get url                # Get current URL
tauri-browse get title              # Get page title

State checks

Check element state. Prints true or false, with exit code 1 for false (enables shell conditionals).

tauri-browse is visible @e1         # Check computed display/visibility/opacity + dimensions
tauri-browse is enabled @e1         # Check .disabled and aria-disabled
tauri-browse is checked @e1         # Check .checked and aria-checked

Wait

tauri-browse wait @e1               # Wait for element to appear
tauri-browse wait 2000              # Wait milliseconds
tauri-browse wait --url "/dashboard" # Wait for URL to contain pattern
tauri-browse wait --text "Welcome"  # Wait for text to appear on page
tauri-browse wait --load networkidle # Wait for network idle
tauri-browse wait --fn "document.readyState === 'complete'"

All wait commands accept an optional timeout as the last argument (in ms, default 10000).

Frames

tauri-browse frame @e1              # Switch to iframe element
tauri-browse frame main             # Switch back to main/top frame

Dialogs

tauri-browse dialog accept          # Accept alert/confirm dialog
tauri-browse dialog accept "input"  # Accept prompt with text
tauri-browse dialog dismiss         # Dismiss dialog
tauri-browse dialog text            # Get dialog text

Console / Errors

Console output is captured automatically on launch and open. Patches console.log/warn/error/info/debug, window.onerror, and unhandled promise rejections.

tauri-browse console                # Show all console entries
tauri-browse console --level warn   # Filter by level
tauri-browse console --clear        # Show entries and clear buffer
tauri-browse errors                 # Show JS errors
tauri-browse errors --clear         # Show and clear errors

Diff

Compare page states for verification and testing.

# Compare current snapshot to last taken snapshot
tauri-browse diff snapshot

# Compare against a saved baseline file
tauri-browse diff snapshot --baseline before.txt

# Visual pixel diff against a baseline image
tauri-browse diff screenshot --baseline before.png

# Diff two URLs (text comparison)
tauri-browse diff url http://localhost:1420 http://localhost:1421

# Diff two URLs (screenshot comparison)
tauri-browse diff url http://localhost:1420 http://localhost:1421 --screenshot

# Scope diff to a specific element
tauri-browse diff url <url1> <url2> --selector "#main"

Session management

Named sessions allow parallel automation of multiple Tauri instances.

# Use named sessions
tauri-browse --session app1 launch ./my-app
tauri-browse --session app2 launch ./my-app

# List active sessions
tauri-browse session list

# Each session has independent refs and state
tauri-browse --session app1 snapshot -i
tauri-browse --session app2 snapshot -i

State persistence

Save and restore cookies and localStorage across sessions.

tauri-browse state save auth.json   # Save current state
tauri-browse state load auth.json   # Restore state
tauri-browse state list             # List saved state files
tauri-browse state show auth.json   # Pretty-print state file contents
tauri-browse state rename old new   # Rename state file
tauri-browse state clean            # Remove empty/corrupt state files
tauri-browse state clear [name]     # Clear saved state

Command chaining

Commands can be chained with && in a single shell invocation. The session persists between commands, so chaining is safe and efficient.

# Navigate, wait, and screenshot in one call
tauri-browse open https://example.com && tauri-browse wait --load networkidle && tauri-browse screenshot page.png

# Chain multiple interactions
tauri-browse fill @e1 "user@example.com" && tauri-browse fill @e2 "password" && tauri-browse click @e3

Use && when you don't need to read the output of intermediate commands. Run commands separately when you need to parse output (e.g. snapshot to discover refs, then interact).

Ref lifecycle

Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:

  • Clicking links or buttons that navigate
  • Form submissions
  • Dynamic content loading (dropdowns, modals)
tauri-browse click @e5          # Navigates to new page
tauri-browse snapshot -i        # MUST re-snapshot
tauri-browse click @e1          # Use new refs

How it works

tauri-browse communicates with tauri-driver over the WebDriver HTTP protocol. tauri-driver wraps WebKitWebDriver, which controls the WebKitGTK webview inside your Tauri app.

tauri-browse  --(HTTP/JSON)-->  tauri-driver  --(WebDriver)-->  WebKitWebDriver  -->  Tauri Webview

Since this drives the actual Tauri webview (not a regular browser), all Tauri IPC commands work -- you can invoke Rust commands, access sidecars, use the full Tauri API from JavaScript, etc.

License

MIT

About

WebDriver CLI for Tauri apps via tauri-driver

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors