Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/openrouter-video-adapter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
'@tanstack/ai-openrouter': minor
---

Add `openRouterVideo`, a video generation adapter for OpenRouter's dedicated async API (`POST /api/v1/videos`) — Seedance, Veo 3.1, Wan, Kling, and Sora 2 Pro through one API key. Follows the jobs/polling architecture (`generateVideo()` → `getVideoJobStatus()`), with per-model `size` / `duration` / provider-option types generated from OpenRouter's `GET /api/v1/videos/models` metadata and validated before submit. Image-conditioned prompts map `metadata.role` onto the wire: `start_frame` / `end_frame` → `frame_images[]` (`first_frame` / `last_frame`), `reference` / `character` → `input_references[]`; frame roles are validated against each model's `supported_frame_images`. Completed videos are downloaded server-side and returned as `data:` URLs (OpenRouter's download URLs require the API key), and the gateway-reported cost is surfaced as `usage.cost`.

Image adapter fixes from the #624 review: requested `size` is now validated (the `WIDTHxHEIGHT` union previously used a Unicode `×`, so every size except `1024x1024` silently dropped its aspect ratio; unsupported sizes now throw with the supported list), `numberOfImages > 1` throws instead of silently returning one image (verified live: the gateway ignores all count keys in `image_config`), and `image_config.strength` (0.0–1.0 image-to-image influence) is exposed via `modelOptions.strength`.
85 changes: 85 additions & 0 deletions docs/adapters/openrouter.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,91 @@ fields are simply absent and the stream completes normally. Both
`openRouterText` and `openRouterResponsesText` populate cost when OpenRouter
returns it.

## Image Generation

`openRouterImage` routes image generation through OpenRouter's
chat-completions surface (`modalities: ['image']`). Multimodal prompts are
supported — text and image parts are forwarded in order for
image-conditioned generation:

```typescript
import { generateImage } from "@tanstack/ai";
import { openRouterImage } from "@tanstack/ai-openrouter";

const result = await generateImage({
adapter: openRouterImage("google/gemini-2.5-flash-image"),
prompt: "A watercolor lighthouse at dusk",
size: "1344x768", // mapped to image_config.aspect_ratio ('16:9')
modelOptions: {
image_size: "2K", // resolution (Gemini models)
strength: 0.35, // image-to-image influence, i2i-capable models only
},
});
```

Notes:

- The pathway returns **exactly one image per request** — `numberOfImages > 1`
throws instead of silently under-delivering. Make multiple requests if you
need multiple candidates.
- `size` must be one of the ten supported `WIDTHxHEIGHT` values (it is
converted to `image_config.aspect_ratio`); anything else throws with the
supported list.

## Video Generation (Experimental)

`openRouterVideo` targets OpenRouter's dedicated **async video API**
(`POST /api/v1/videos`) — Seedance, Veo 3.1, Wan, Kling, and Sora 2 Pro
through your one OpenRouter key. It follows the jobs/polling architecture
shared by all TanStack AI video adapters:

```typescript
// Server: create the job, then poll
import { generateVideo, getVideoJobStatus } from "@tanstack/ai";
import { openRouterVideo } from "@tanstack/ai-openrouter";

const adapter = openRouterVideo("bytedance/seedance-2.0");

const { jobId } = await generateVideo({
adapter,
prompt: [
{ type: "text", content: "Animate this product shot, slow push-in" },
{
type: "image",
source: { type: "url", value: "https://your-cdn.com/product.png" },
metadata: { role: "start_frame" },
},
],
size: "1280x720",
duration: 8,
});

let status = await getVideoJobStatus({ adapter, jobId });
while (status.status !== "completed" && status.status !== "failed") {
await new Promise((r) => setTimeout(r, 5000));
status = await getVideoJobStatus({ adapter, jobId });
}
// status.url is a data: URL (OpenRouter download URLs require the API key,
// so the adapter downloads server-side); status.usage?.cost is the real
// billed cost reported by the gateway.
```

```tsx
// Client: track the job with the useGenerateVideo hook
import { useGenerateVideo, fetchServerSentEvents } from "@tanstack/ai-react";

const { generate, result, videoStatus, isLoading } = useGenerateVideo({
connection: fetchServerSentEvents("/api/generate/video"),
});
// result?.url renders directly: <video src={result.url} controls />
```

Sizes, durations, and per-model options (`resolution`, `aspectRatio`,
`generateAudio`, `seed`, …) are typed and validated per model from
OpenRouter's video model metadata. See
[Video Generation](../media/video-generation.md) for the full lifecycle,
streaming mode, and the image-to-video role-mapping table.

## Next Steps

- [Getting Started](../getting-started/quick-start) - Learn the basics
Expand Down
7 changes: 4 additions & 3 deletions docs/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -249,13 +249,13 @@
"label": "Image Generation",
"to": "media/image-generation",
"addedAt": "2026-04-15",
"updatedAt": "2026-06-08"
"updatedAt": "2026-06-10"
},
{
"label": "Video Generation",
"to": "media/video-generation",
"addedAt": "2026-04-15",
"updatedAt": "2026-06-08"
"updatedAt": "2026-06-10"
},
{
"label": "Generation Hooks",
Expand Down Expand Up @@ -440,7 +440,8 @@
{
"label": "OpenRouter Adapter",
"to": "adapters/openrouter",
"addedAt": "2026-04-15"
"addedAt": "2026-04-15",
"updatedAt": "2026-06-10"
},
{
"label": "OpenAI-Compatible",
Expand Down
2 changes: 1 addition & 1 deletion docs/media/image-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,7 @@ await generateImage({
| **Gemini** | Native models (`gemini-*-flash-image`, "nano-banana", etc.) → prompt parts map 1:1 onto multimodal `contents`, preserving interleaved order. Up to ~14 input images (provider limit, not enforced by the SDK).<br>Imagen models → throws (text-to-image only). |
| **fal.ai** | Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (e.g. nano-banana edit gets `image_urls`, Fooocus masks get `mask_image_url`). Defaults for unknown endpoints: 1 input → `image_url`; multiple → `image_urls`; `role: 'mask'` → `mask_url`; `role: 'control'` → `control_image_url`; `role: 'reference'` / `'character'` → `reference_image_urls`. Override with `modelOptions` for endpoint-specific fields. |
| **Grok** | grok-imagine models → xAI's `/v1/images/edits` (up to 3 source images, addressed by xAI in request order; prompt sent verbatim). `role: 'mask'` / `'control'` throw (no Imagine API equivalent). `grok-2-image-1212` throws (text-to-image only). |
| **OpenRouter** | Prompt parts map 1:1 onto multimodal `image_url` / `text` content parts, preserving interleaved order, and are forwarded to the underlying image model. |
| **OpenRouter** | Prompt parts map 1:1 onto multimodal `image_url` / `text` content parts, preserving interleaved order, and are forwarded to the underlying image model. `modelOptions.strength` (0.0–1.0) controls image-to-image influence on models that document it (e.g. Recraft). One image per request — `numberOfImages > 1` throws (the gateway ignores count keys). |
| **Anthropic** | n/a — no image generation API. |

Adapters that don't support image-conditioned generation throw a clear
Expand Down
69 changes: 59 additions & 10 deletions docs/media/video-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,19 @@
title: Video Generation
id: video-generation
order: 6
description: "Generate video from text prompts with OpenAI Sora using TanStack AI's experimental generateVideo() jobs/polling API."
description: "Generate video from text prompts with OpenAI Sora, fal.ai, or OpenRouter (Seedance, Veo, Wan) using TanStack AI's experimental generateVideo() jobs/polling API."
keywords:
- tanstack ai
- video generation
- sora
- openrouter
- seedance
- veo
- generateVideo
- jobs api
- experimental
- text-to-video
- image-to-video
---

# Video Generation (Experimental)
Expand All @@ -36,6 +40,8 @@ TanStack AI provides experimental support for video generation through dedicated

Currently supported:
- **OpenAI**: Sora-2 and Sora-2-Pro models (when available)
- **fal.ai**: Kling, MiniMax, Hunyuan, and other fal-hosted video endpoints
- **OpenRouter**: Seedance, Veo 3.1, Wan, Kling, Sora 2 Pro and others via the dedicated async video API (`POST /api/v1/videos`)

## Basic Usage

Expand Down Expand Up @@ -415,12 +421,12 @@ for the per-provider table.
Each `ImagePart` can carry an optional `metadata.role` hint that the
adapter uses to route the input to the provider-specific field:

| Role | Maps to |
| --------------- | ------------------------------------------------------------- |
| `'start_frame'` | fal `start_image_url` (positional default for the first input) |
| `'end_frame'` | fal `end_image_url` (Veo `lastFrame` planned — no Veo adapter yet) |
| `'reference'` | fal `reference_image_urls` (Veo `referenceImages` planned) |
| `'character'` | Same as `'reference'` — character consistency images |
| Role | Maps to |
| --------------- | --------------------------------------------------------------------------------------------------------- |
| `'start_frame'` | fal `start_image_url`; OpenRouter `frame_images[]` with `frame_type: 'first_frame'` (positional default for the first input) |
| `'end_frame'` | fal `end_image_url`; OpenRouter `frame_images[]` with `frame_type: 'last_frame'` (Veo `lastFrame` planned — no Veo adapter yet) |
| `'reference'` | fal `reference_image_urls`; OpenRouter `input_references[]` (Veo `referenceImages` planned) |
| `'character'` | Same as `'reference'` — character consistency images |

```typescript
import { falVideo } from '@tanstack/ai-fal'
Expand All @@ -445,7 +451,8 @@ await generateVideo({
| ------------ | -------------------------------------------------------------------------------------------------------- |
| **OpenAI** | Sora-2 / Sora-2-Pro → the image part goes to `input_reference`; flattened text is the prompt. Single image only — throws if more than one. |
| **fal.ai** | Field names resolve per endpoint from a map generated from the fal SDK's endpoint types — e.g. `role: 'start_frame'` lands on `image_url` for Kling/Veo image-to-video, `first_frame_url` for first-last-frame endpoints, and `start_image_url` otherwise. Defaults: single input → `image_url` (start frame); `role: 'end_frame'` → `end_image_url`; `role: 'reference'` / `'character'` → `reference_image_urls`. Override per-endpoint via `modelOptions` — the media-conditioning fields are typed optional there (even when the endpoint requires them) since they usually arrive as prompt parts. |
| **Gemini** | Veo adapter not yet implemented — image prompt parts will be supported when Veo lands. |
| **OpenRouter** | `role: 'start_frame'` / `'end_frame'` → `frame_images[]` with `frame_type: 'first_frame'` / `'last_frame'`; `role: 'reference'` / `'character'` → `input_references[]`; an unroled image defaults to the start frame. At most one start and one end frame; frame roles are validated against the model's `supported_frame_images` metadata (e.g. Hailuo only takes a first frame). When both frame images and references are present, OpenRouter treats the request as image-to-video and references take lower priority. URL image sources pass through verbatim and `data` sources become data URIs — OpenRouter does not fetch URLs behind redirects or bot checks, so use directly accessible URLs. |
| **Gemini** | Veo adapter not yet implemented — image prompt parts will be supported when Veo lands (Veo models are available today through `openRouterVideo`). |

Adapters whose underlying API can't accept image inputs throw a clear
runtime error so calls fail fast.
Expand Down Expand Up @@ -488,6 +495,45 @@ const { jobId } = await generateVideo({
})
```

### OpenRouter Model Options

OpenRouter's [video generation API](https://openrouter.ai/docs/guides/overview/multimodal/video-generation)
runs Seedance, Veo, Wan, Kling, Sora 2 Pro and others behind one async jobs
API. `size`, `duration`, and the per-model options below are typed **and
validated per model** from OpenRouter's published model capabilities (a size
or duration the model doesn't support throws before the request is sent):

```typescript
import { generateVideo } from '@tanstack/ai'
import { openRouterVideo } from '@tanstack/ai-openrouter'

const { jobId } = await generateVideo({
adapter: openRouterVideo('bytedance/seedance-2.0'),
prompt: 'A beautiful sunset over the ocean',
size: '1280x720', // per-model union from OpenRouter's model metadata
duration: 8, // validated against the model's supported durations
modelOptions: {
resolution: '720p', // alternative to size: resolution + aspectRatio
aspectRatio: '16:9',
generateAudio: true, // omitted from the type for models that can't
seed: 42, // omitted from the type for models that can't
callbackUrl: 'https://your-app.com/webhooks/openrouter-video',
provider: { options: { bytedance: { watermark: false } } }, // passthrough
},
})
```

Two OpenRouter-specific behaviors to know about:

- **The completed video arrives as a `data:` URL.** OpenRouter's download
URLs require your API key in an `Authorization` header, so the adapter
downloads the content server-side and returns a base64 data URL that can
be handed straight to a `<video>` tag. Videos over ~10 MiB log a warning —
prefer re-uploading to your own storage/CDN over passing large data URLs
around.
- **Cost is reported on completion.** The gateway reports the real billed
cost for the job; it's surfaced as `usage.cost` on the completed result.

## Response Types

> **Note:** The interfaces below are the underlying adapter-level types. The `getVideoJobStatus()` helper returns a single merged object, `{ status, progress?, url?, error?, usage? }` — it does not return `jobId` or `expiresAt`.
Expand Down Expand Up @@ -586,9 +632,12 @@ Check the [OpenAI documentation](https://platform.openai.com/docs) for current l

## Environment Variables

The video adapter uses the same environment variable as other OpenAI adapters:
The video adapters use the same environment variables as the other adapters
from their packages:

- `OPENAI_API_KEY`: Your OpenAI API key
- `OPENAI_API_KEY`: Your OpenAI API key (`openaiVideo`)
- `OPENROUTER_API_KEY`: Your OpenRouter API key (`openRouterVideo`)
- `FAL_KEY`: Your fal.ai API key (`falVideo`)

## Explicit API Keys

Expand Down
2 changes: 1 addition & 1 deletion packages/ai-openrouter/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
"model-router"
],
"dependencies": {
"@openrouter/sdk": "0.12.35",
"@openrouter/sdk": "0.12.79",
"@tanstack/ai-utils": "workspace:*"
},
"devDependencies": {
Expand Down
42 changes: 38 additions & 4 deletions packages/ai-openrouter/src/adapters/image.ts
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,25 @@ const SIZE_TO_ASPECT_RATIO: Record<string, string> = {
'1536x672': '21:9',
}

/**
* Resolve a requested size to the aspect ratio OpenRouter's chat-completions
* image pathway understands (`image_config.aspect_ratio`). The pathway has
* no free-form size field, so a size outside the mapping table cannot be
* expressed — throw rather than silently generating at the default 1:1.
* Accepts the multiplication sign ('×') as a separator for tolerance.
*/
function sizeToAspectRatio(size: string | undefined): string | undefined {
if (!size) return undefined
const normalized = size.replace('×', 'x')
const aspectRatio = SIZE_TO_ASPECT_RATIO[normalized]
if (!aspectRatio) {
throw new Error(
`openrouter: unsupported image size '${size}'. Supported sizes: ${Object.keys(SIZE_TO_ASPECT_RATIO).join(', ')}.`,
)
}
return aspectRatio
}

/**
* Convert a TanStack ImagePart into the URL string accepted by OpenRouter's
* `image_url` content parts: public URLs pass through, data sources become
Expand Down Expand Up @@ -89,8 +108,16 @@ export class OpenRouterImageAdapter<
}

const { model, numberOfImages, size, modelOptions, logger } = options
// Use provided aspect_ratio or derive from size
const aspectRatio = size ? SIZE_TO_ASPECT_RATIO[size] : undefined
// OpenRouter's chat-completions image pathway returns exactly one image
// per request and ignores any count key in image_config (verified
// against the live API), so reject multi-image requests instead of
// silently under-delivering.
if (numberOfImages !== undefined && numberOfImages > 1) {
throw new Error(
`openrouter: the chat-completions image pathway generates one image per request (numberOfImages: ${numberOfImages}). Make multiple requests instead.`,
)
}
const aspectRatio = sizeToAspectRatio(size)

// Image-conditioned generation: map the prompt parts 1:1 onto
// chat-completions content parts, preserving the interleaved order —
Expand Down Expand Up @@ -135,9 +162,11 @@ export class OpenRouterImageAdapter<
],
modalities: ['image'],
stream: false,
// OpenRouter filters out invalid config per provider specifications
// The SDK serializes this record verbatim as `image_config`, so keys
// must match the HTTP API's documented snake_case fields — miskeyed
// entries are silently ignored by the gateway (verified live:
// `aspect_ratio` changes output dimensions, `aspectRatio` does not).
imageConfig: {
...(numberOfImages ? { numberOfImages } : {}),
...(aspectRatio
? {
aspect_ratio: aspectRatio,
Expand All @@ -148,6 +177,11 @@ export class OpenRouterImageAdapter<
image_size: modelOptions.image_size,
}
: {}),
...(modelOptions?.strength !== undefined
? {
strength: modelOptions.strength,
}
: {}),
},
},
})
Expand Down
Loading
Loading