Vibium V2 Roadmap

Features deferred from V1 to keep scope tight. Revisit after V1 ships and we have user feedback.

The Full Vision: Sense → Think → Act

Vibium's architecture follows the classic robotics control loop:

Layer	Component	Purpose
Sense	Retina	Chrome extension that observes everything
Think	Cortex	Memory + navigation planning
Act	Clicker	Browser automation via BiDi

V1 ships Act (Clicker). V2 adds Sense and Think.

Cortex — Think Layer

What: SQLite-backed datastore that builds an "app map" of the application.

Why deferred: Complex infrastructure that may be YAGNI. Agents using Claude Code have conversation context — unclear if persistent navigation graphs add value over just replaying actions.

Components:

SQLite database with schema for pages, actions, sessions
sqlite-vec integration for embeddings (via CGO or pure Go alternative)
REST API for data ingestion (JSONL)
Graph builder and Dijkstra pathfinding
MCP server with tools: page_info, find_element, find_path, search, history

When to build: When users report that agents are:

Repeatedly rediscovering the same flows
Losing context across sessions
Unable to plan multi-step navigation

Estimated effort: 2-3 weeks

Retina — Sense Layer

What: Chrome extension that passively records all browser activity regardless of what's driving it.

Why deferred: Requires Cortex to send data to. Also, MCP screenshot tool may provide enough observability for V1 use cases.

Components:

Chrome Manifest V3 extension
Content script with click/keypress/navigation listeners
DOM snapshot capture
Screenshot capture via background script
JSONL formatting and Cortex sender
Popup UI for recording control

When to build: When users need to:

Record human sessions for replay
Debug what happened during agent runs
Train models on interaction data

Estimated effort: 1-2 weeks

Python Client ✅

Status: shipped 2025-12-31

pip install vibium

Java Client

What: Maven/Gradle dependency with idiomatic Java API.

Why deferred: Java ecosystem moves slowly. Enterprise users will want stability we can't guarantee in V1.

API:

import com.vibium.Browser;
import com.vibium.Vibe;

Vibe vibe = Browser.launch();
vibe.go("https://example.com");
var el = vibe.find("a");
el.click();
vibe.quit();

When to build: When enterprise users request it, likely after V1 is proven stable.

Estimated effort: 1-2 weeks

Video Recording

What: Built-in screen recording of browser sessions.

Why deferred: Adds FFmpeg dependency complexity. Screenshots may be sufficient for debugging.

Implementation:

Capture screenshots at interval (e.g., 10fps)
Encode to MP4/WebM via FFmpeg
Start/stop via vibium.recording.start / vibium.recording.stop BiDi commands
JS API: vibe.startRecording(), vibe.stopRecording()

When to build: When users need video artifacts for:

Test failure debugging
Demo generation
Compliance/audit trails

Estimated effort: 1 week

AI-Powered Locators

What: Natural language element finding and actions.

await vibe.do("click the login button");
await vibe.check("verify the dashboard loaded");
const el = await vibe.find("the blue submit button");

Why deferred: This is the hardest problem. Requires:

Vision model integration (which model? where does it run?)
Latency management (vision calls are slow)
Cost management (vision calls are expensive)
Fallback strategies when AI fails

Open questions:

Local model (Qwen-VL) vs API (Claude vision)?
Screenshot → model → coordinates, or DOM → model → selector?
How to handle ambiguity ("the button" when there are 5)?
Caching/memoization of element locations?

When to build: After V1, with dedicated research spike. This could be a V2 headline feature or a separate product.

Estimated effort: 3-6 weeks (high uncertainty)

Cortex UI

What: Web-based visualization of the app map.

Why deferred: Depends on Cortex existing. Also unclear if visualization adds value vs just MCP queries.

Features:

Graph view of pages and flows
Test result display
Live execution viewer
Embedded chat for test generation

Prototype: https://vibium-cortex.lovable.app/?dataset=view-action-sample

When to build: After Cortex, if users struggle to understand app maps via MCP alone.

Estimated effort: 2-3 weeks

Network Tracing

What: Capture and inspect network requests/responses.

Why deferred: BiDi network module is complex. Most agent use cases don't need request inspection.

Features:

Enable/disable network capture
Log all requests/responses
HAR export
Request interception (mock responses)

When to build: When users need to debug API calls or mock backends.

Estimated effort: 1-2 weeks

Firefox & Edge Support

What: Support browsers beyond Chrome.

Why deferred: Chrome covers 90%+ of use cases. BiDi implementations vary across browsers.

When to build: When users explicitly need Firefox (privacy testing) or Edge (enterprise).

Estimated effort: 1 week per browser

Docker & Cloud Deployment

What: Official Docker images and Fly.io deployment guides.

Why deferred: Local-first is V1 priority. Cloud adds operational complexity.

Deliverables:

Dockerfile.clicker
docker-compose.yml for full stack
Fly.io fly.toml and deployment guide
GPU machine setup for local models

When to build: When users want to run agents in CI or production.

Estimated effort: 1 week

Priority Order (Tentative)

Based on likely user demand:

~~Python client~~ ✅ shipped
Video recording — Debugging value, moderate effort
Network tracing — DevTools parity
Cortex — If agents need persistent memory
Retina — If recording human sessions matters
AI locators — High value but high uncertainty
Java client — Enterprise demand
Cortex UI — Nice to have
Firefox/Edge — Edge cases

Feedback Channels

After V1 ships, track what users actually ask for:

GitHub issues
Discord/community feedback
Usage analytics (opt-in)
Direct user interviews

Build what's requested, not what we assume is needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vibium V2 Roadmap

The Full Vision: Sense → Think → Act

Cortex — Think Layer

Retina — Sense Layer

Python Client ✅

Java Client

Video Recording

AI-Powered Locators

Cortex UI

Network Tracing

Firefox & Edge Support

Docker & Cloud Deployment

Priority Order (Tentative)

Feedback Channels

FilesExpand file tree

V2-ROADMAP.md

Latest commit

History

V2-ROADMAP.md

File metadata and controls

Vibium V2 Roadmap

The Full Vision: Sense → Think → Act

Cortex — Think Layer

Retina — Sense Layer

Python Client ✅

Java Client

Video Recording

AI-Powered Locators

Cortex UI

Network Tracing

Firefox & Edge Support

Docker & Cloud Deployment

Priority Order (Tentative)

Feedback Channels