AI-powered visual assertions for E2E tests. Send screenshots — or short video recordings — to Claude, GPT, or Gemini and get structured, typed results.
# Install the library (includes OpenAI SDK by default)
npm install visual-ai-assertions
# Optional: install additional provider SDKs
npm install @anthropic-ai/sdk # for Claude
npm install @google/genai # for Gemini
# Zod is a peer dependency
npm install zodThis library uses sharp for image processing. Sharp downloads native binaries automatically for most supported platforms.
If installation fails in CI, Docker, or a minimal Linux image:
- See the sharp installation guide
- On Alpine Linux, install
vips-devwithapk add --no-cache vips-dev - On minimal Docker images, use
--platform=linux/amd64or install the required build tools
import { test, expect } from "@playwright/test";
import { visualAI } from "visual-ai-assertions";
const ai = visualAI();
// Provider auto-inferred from ANTHROPIC_API_KEY env var
test("login page looks correct", async ({ page }) => {
await page.goto("https://myapp.com/login");
const screenshot = await page.screenshot();
const result = await ai.check(screenshot, [
"A login form is visible with email and password fields",
"A 'Sign In' button is present and visually enabled",
"The company logo appears in the header",
"No error messages are displayed",
]);
// Simple pass/fail
expect(result.pass).toBe(true);
// Or inspect individual statements
for (const stmt of result.statements) {
expect(stmt.pass, `Failed: ${stmt.statement} — ${stmt.reasoning}`).toBe(true);
}
});import { visualAI } from "visual-ai-assertions";
const ai = visualAI({ model: "gpt-5-mini" });
// Provider inferred from model prefix
describe("Product Page", () => {
it("should display all required elements", async () => {
await browser.url("https://myapp.com/products/1");
const screenshot = await browser.saveScreenshot("./screenshot.png");
const result = await ai.elementsVisible(screenshot, [
"Product title",
"Price tag",
"Add to Cart button",
"Product image",
]);
expect(result.pass).toBe(true);
});
});Create an AI visual analysis instance. Provider is auto-inferred from the model name or API key environment variable.
import { visualAI, Provider, Model } from "visual-ai-assertions";
// Minimal — provider inferred from ANTHROPIC_API_KEY env var
const ai = visualAI();
// Explicit configuration
const ai = visualAI({
model: "claude-sonnet-4-6", // optional, sensible defaults per provider
apiKey: "sk-...", // optional, defaults to provider env var
debug: true, // optional, logs prompts/responses to stderr
maxTokens: 4096, // optional, default 4096
reasoningEffort: "high", // optional, "low" | "medium" | "high" | "xhigh"
trackUsage: false, // optional, defaults to false — usage stats to stderr
});
// Use constants for IDE autocomplete
const ai = visualAI({
model: Model.Anthropic.SONNET_4_6,
});Visual assertion against a screenshot or short video. Returns pass: true only if ALL statements are true. For video inputs, a statement passes when it is true at any sampled frame.
// Single statement
const result = await ai.check(screenshot, "The login button is visible");
// Multiple statements
const result = await ai.check(screenshot, [
"The login button is visible",
"No error messages are displayed",
]);
// With instructions
const result = await ai.check(screenshot, ["The form is submitted"], {
instructions: ["Ignore loading spinners that appear briefly"],
});
// Video input — statement is true if it ever happens during the clip
const result = await ai.check("./recording.webm", [
'A success toast with text "Saved" briefly appears',
]);
console.log(result.statements[0].timestampSeconds); // e.g. 3.5Returns: CheckResult
{
pass: boolean; // true only if ALL statements pass
reasoning: string; // overall summary
issues: Issue[]; // structured findings
statements: StatementResult[]; // per-statement breakdown
usage?: {
inputTokens: number;
outputTokens: number;
estimatedCost?: number; // USD
durationSeconds?: number; // API call duration
};
}Free-form analysis of an image or video. Returns structured issues with priority and category. Video inputs are sampled into a frame timeline; the result includes frameReferences indicating which frames the model relied on.
const result = await ai.ask(screenshot, "Analyze this page for UI issues");
// Filter by priority
const critical = result.issues.filter((i) => i.priority === "critical");
// With instructions
const result = await ai.ask(screenshot, "Check for accessibility issues", {
instructions: ["Ignore contrast on decorative elements"],
});Returns: AskResult
{
summary: string; // high-level analysis
issues: Issue[]; // categorized findings
usage?: {
inputTokens: number;
outputTokens: number;
estimatedCost?: number;
durationSeconds?: number;
};
}Compare two images and get structured differences.
import { writeFileSync } from "node:fs";
// Basic comparison
const result = await ai.compare(before, after);
// gemini-3-flash-preview includes an annotated diff by default.
// Pass { diffImage: false } to opt out.
// With custom prompt and instructions
const result = await ai.compare(before, after, {
prompt: "Focus on header layout changes",
instructions: ["Ignore date/time differences"],
});
// With AI-generated diff image (supported by gemini-3-flash-preview and gemini-3.5-flash;
// only gemini-3-flash-preview auto-enables it — pass diffImage: true explicitly for 3.5-flash)
const result = await ai.compare(before, after, {
diffImage: true,
});
if (result.diffImage) {
writeFileSync("diff.png", result.diffImage.data);
}Returns: CompareResult
{
pass: boolean; // true if no critical/major changes
reasoning: string; // overall summary
changes: ChangeEntry[]; // list of visual differences
diffImage?: { // present when diffing is enabled explicitly or by Gemini 3 preview defaults
data: Buffer; // PNG image data
width: number;
height: number;
mimeType: "image/png";
};
usage?: UsageInfo;
}Where ChangeEntry is:
{
description: string; // what changed
severity: "critical" | "major" | "minor";
}Type-safe methods for common visual QA checks. All return CheckResult. Use Accessibility, Layout, and Content constants for IDE autocomplete.
import { Accessibility, Layout, Content } from "visual-ai-assertions";
// Check that UI elements are visible
await ai.elementsVisible(screenshot, ["Submit button", "Nav bar", "Footer"]);
// Check that UI elements are hidden
await ai.elementsHidden(screenshot, ["Loading spinner", "Error modal"]);
// Accessibility checks (contrast, readability, interactive visibility, color blindness, color-alone meaning)
await ai.accessibility(screenshot);
await ai.accessibility(screenshot, {
checks: [Accessibility.CONTRAST, Accessibility.COLOR_BLINDNESS, Accessibility.COLOR_ALONE],
});
// Layout checks (overlap, overflow, alignment)
await ai.layout(screenshot);
await ai.layout(screenshot, {
checks: [Layout.OVERLAP, Layout.OVERFLOW],
instructions: ["Sticky headers may overlap content — ignore if < 10px"],
});
// Page load verification
await ai.pageLoad(screenshot);
await ai.pageLoad(screenshot, { expectLoaded: false }); // expect loading state
// Content checks (placeholder text, errors, broken images)
await ai.content(screenshot);
await ai.content(screenshot, {
checks: [Content.PLACEHOLDER_TEXT, Content.ERROR_MESSAGES],
});Every issue includes:
{
priority: "critical" | "major" | "minor";
category: "accessibility" |
"missing-element" |
"layout" |
"content" |
"styling" |
"functionality" |
"performance" |
"other";
description: string; // what the issue is
suggestion: string; // how to fix it
}Accepts multiple formats:
// Buffer (from Playwright screenshot)
const screenshot = await page.screenshot();
await ai.check(screenshot, "...");
// File path
await ai.check("./screenshots/page.png", "...");
// Base64 string
await ai.check(base64String, "...");
// URL
await ai.check("https://example.com/screenshot.png", "...");Oversized images are automatically resized to provider limits.
ai.check() and ai.ask() also accept short video recordings (.mp4, .webm, .mov, .mkv) — useful for asserting on transient UI like toast messages. Accepted shapes are file path, data:video/...;base64,... URL, raw base64 string, Buffer, and Uint8Array. HTTP/HTTPS URLs are not supported for video inputs — fetch the bytes yourself first.
// Playwright recording on disk
const result = await ai.check("./trace/video/recording.webm", [
'A success toast with text "Saved" briefly appears',
]);
// Result includes frame metadata + per-statement timestamps
console.log(result.frames);
// { count: 4, timestampsSeconds: [0.5, 1.5, 2.5, 3.5], durationSeconds: 4.0 }
console.log(result.statements[0].timestampSeconds); // 3.5
// Override sampling — defaults are 1 fps, max 10 frames, max 10 s of video
await ai.check("./long-clip.mp4", ["Loader disappears"], {
video: { fps: 2, maxFrames: 20, maxDurationSeconds: 15 },
});maxFrames is hard-capped at 60 to keep memory bounded. Frames are downscaled so the longer edge fits within 1568 px before being sent to the provider.
How it works: the library samples frames with ffmpeg and sends them to the provider as an ordered timeline. A statement passes when it is true at any sampled frame, unless its wording specifies otherwise (e.g. "throughout"). Template helpers (accessibility, layout, pageLoad, content, elementsVisible, elementsHidden) are image-only — pass video to check() or ask() instead.
ffmpeg setup. Video support works out of the box — fluent-ffmpeg, @ffmpeg-installer/ffmpeg, and @ffprobe-installer/ffprobe ship as regular dependencies and bundle platform-specific ffmpeg/ffprobe binaries. If you ran npm install you already have everything you need. On platforms where the prebuilt binary is unavailable (or if you've pruned dependencies), check() and ask() throw VisualAIVideoError (import from visual-ai-assertions to instanceof-narrow it) when called with video input.
import {
formatCheckResult,
formatCompareResult,
assertVisualResult,
assertVisualCompareResult,
} from "visual-ai-assertions";
// Pretty-print results to console
const result = await ai.check(screenshot, ["Login form is visible"]);
console.log(formatCheckResult(result, "login-page"));
// Throw VisualAIAssertionError on failure (includes full result on error)
assertVisualResult(result, "login-page");
// Same for compare results
const diff = await ai.compare(before, after);
console.log(formatCompareResult(diff));
assertVisualCompareResult(diff, "regression-check");All errors extend VisualAIError, and every concrete error includes an error.code string for programmatic handling:
import { isVisualAIKnownError } from "visual-ai-assertions";
try {
const result = await ai.check(screenshot, "Page is loaded");
} catch (error) {
if (isVisualAIKnownError(error)) {
switch (error.code) {
case "AUTH_FAILED":
// Invalid or missing API key
break;
case "RATE_LIMITED":
// Rate limited — error.retryAfter has seconds to wait
break;
case "IMAGE_INVALID":
// Invalid image: corrupt, unsupported format, etc.
break;
case "VIDEO_INVALID":
// Invalid video: missing ffmpeg deps, oversized clip, decode failure, etc.
break;
case "RESPONSE_PARSE_FAILED":
// AI returned unparseable response — error.rawResponse has raw text
break;
case "CONFIG_INVALID":
// Provider SDK not installed or invalid config
break;
case "ASSERTION_FAILED":
// assertVisualResult threw — error.result has the full failed result
break;
case "PROVIDER_ERROR":
case "VISUAL_AI_ERROR":
break;
}
}
}The VisualAIKnownError union and isVisualAIKnownError() helper are useful when you want switch (error.code) to narrow to subclass-specific fields such as retryAfter, statusCode, or rawResponse. Class-based instanceof checks continue to work too.
| Provider | Environment Variable |
|---|---|
| Anthropic | ANTHROPIC_API_KEY |
| OpenAI | OPENAI_API_KEY |
GOOGLE_API_KEY |
| Variable | Description |
|---|---|
VISUAL_AI_MODEL |
Default model when model is not set in config. Overrides the provider's default model. |
VISUAL_AI_DEBUG |
Enable error diagnostic logging to stderr. Does not enable prompt/response logging. Use "true" or "1". |
VISUAL_AI_DEBUG_PROMPT |
Enable prompt-only debug logging to stderr. Use "true" or "1". |
VISUAL_AI_DEBUG_RESPONSE |
Enable response-only debug logging to stderr. Use "true" or "1". |
VISUAL_AI_DEBUG_FRAMES |
Persist sampled video frames to disk for offline inspection. Use "true" or "1". Frames are written to ./visual-ai-debug-frames/<timestamp>-<id>/ (override path with the next variable). Has no effect on image-only inputs. |
VISUAL_AI_DEBUG_FRAMES_DIR |
Override the base directory for VISUAL_AI_DEBUG_FRAMES. Each call still gets its own timestamped subdirectory inside it. |
VISUAL_AI_TRACK_USAGE |
Enable usage tracking (token counts and cost) to stderr. Use "true" or "1". |
| Option | Type | Default | Description |
|---|---|---|---|
apiKey |
string | env var | API key for the provider |
model |
string | provider default | Model to use |
debug |
boolean | false |
Enable error diagnostic logging to stderr |
debugPrompt |
boolean | false |
Log prompts to stderr |
debugResponse |
boolean | false |
Log responses to stderr |
maxTokens |
number | 4096 |
Max tokens for AI response |
reasoningEffort |
string | undefined |
"low" "medium" "high" "xhigh" — controls how deeply the model reasons |
trackUsage |
boolean | false |
Log token usage and estimated cost to stderr |
import type {
AskResult,
CheckResult,
CompareResult,
Frame,
MediaInput,
SupportedMimeType,
SupportedVideoMimeType,
VideoFramesMetadata,
VideoSamplingOptions,
VisualAIConfig,
VisualAIErrorCode,
} from "visual-ai-assertions";SupportedMimeType is the exported image MIME union:
type SupportedMimeType = "image/jpeg" | "image/png" | "image/webp" | "image/gif";Default models:
| Provider | Default Model |
|---|---|
| Anthropic | claude-sonnet-4-6 |
| OpenAI | gpt-5-mini |
gemini-3-flash-preview |
Control how deeply the model reasons before responding. Higher effort produces more thorough analysis but uses more tokens and takes longer.
const ai = visualAI({
reasoningEffort: "high", // "low" | "medium" | "high" | "xhigh"
});When omitted, each provider uses its default behavior. The "xhigh" level enables maximum reasoning depth.
| Provider | Native Parameter | "xhigh" maps to |
|---|---|---|
| Anthropic Opus 4.7 | thinking.type: "adaptive" + output_config.effort |
effort: "xhigh" |
| Anthropic (other) | thinking.type: "adaptive" + output_config.effort |
effort: "max" |
| OpenAI | reasoning.effort (Responses API) |
effort: "xhigh" |
thinkingConfig.thinkingBudget (1024 / 8192 / 24576) |
24576 (max budget) |
All listed models support image/vision input. Pass any model ID to the model config option.
| Model | Model ID | Input $/MTok | Output $/MTok | Notes |
|---|---|---|---|---|
| Claude Opus 4.7 | claude-opus-4-7 |
$5 | $25 | Most capable; supports xhigh effort tier |
| Claude Opus 4.6 | claude-opus-4-6 |
$5 | $25 | Previous flagship, 128K max output |
| Claude Sonnet 4.6 | claude-sonnet-4-6 |
$3 | $15 | Default — best value |
| Claude Haiku 4.5 | claude-haiku-4-5 |
$1 | $5 | Fastest, budget-friendly |
| Model | Model ID | Input $/MTok | Output $/MTok | Notes |
|---|---|---|---|---|
| GPT-5.5 | gpt-5.5 |
$5 | $30 | Newest flagship, 1M context |
| GPT-5.4 Pro | gpt-5.4-pro |
$30 | $180 | Most capable, extended context |
| GPT-5.4 | gpt-5.4 |
$2.50 | $15 | Best vision quality |
| GPT-5.2 | gpt-5.2 |
$1.75 | $14 | Balanced quality and cost |
| GPT-5.4 mini | gpt-5.4-mini |
$0.75 | $4.50 | Fast and affordable |
| GPT-5.4 nano | gpt-5.4-nano |
$0.20 | $1.25 | Cheapest OpenAI option |
| GPT-5 mini | gpt-5-mini |
$0.25 | $2 | Default — fast and cheap |
| Model | Model ID | Input $/MTok | Output $/MTok | Notes |
|---|---|---|---|---|
| Gemini 3.5 Flash | gemini-3.5-flash |
$1.50 | $9 | Strongest agentic & coding model |
| Gemini 3.1 Pro | gemini-3.1-pro-preview |
$2 | $12 | Preview — most advanced reasoning |
| Gemini 3.1 Flash Lite | gemini-3.1-flash-lite-preview |
$0.25 | $1.50 | Preview — lightweight and cheap |
| Gemini 3 Flash | gemini-3-flash-preview |
$0.50 | $3 | Default — fast and capable |
MIT