Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .changeset/gemini-veo-video-adapter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
'@tanstack/ai': minor
'@tanstack/ai-gemini': minor
---

Add a Google Veo video adapter (`geminiVideo` / `createGeminiVideo`) and the
per-model typed-duration video contract it is built on (#534, #634).

**`@tanstack/ai`** (additive, non-breaking): `VideoAdapter` /
`BaseVideoAdapter` gain a `TModelDurationByName` generic (defaulting to
`Record<string, number>`, preserving today's `duration?: number` typing for
adapters without a map) plus two introspection methods with safe defaults:

- `availableDurations()` β€” a `DurationOptions` tagged union
(`discrete | range | mixed | none`) describing the durations the current
model accepts. Default: `{ kind: 'none' }`.
- `snapDuration(seconds)` β€” coerce raw seconds to the closest valid duration
(`snapToDurationOption` is exported for adapter authors). Default:
`undefined`.

`generateVideo({ duration })` is now typed per model via
`VideoDurationForAdapter<TAdapter>`.

**`@tanstack/ai-gemini`**: new Veo adapter over the long-running
`:predictLongRunning` operation, supporting `veo-3.1-generate-preview`,
`veo-3.1-fast-generate-preview`, `veo-3.0-generate-001`,
`veo-3.0-fast-generate-001`, and `veo-2.0-generate-001`:

- `geminiVideo('veo-3.0-generate-001')` β†’ `duration?: 4 | 6 | 8`
(Veo 2: `5 | 6 | 8`); `adapter.snapDuration(7)` β†’ `6`.
- Multimodal prompts: the first un-roled / `'start_frame'` image part
becomes the input image, `'end_frame'` β†’ `lastFrame`, `'reference'` /
`'character'` β†’ `referenceImages`.
- `size` takes Veo aspect ratios (`'16:9' | '9:16'`); everything else from
the SDK's `GenerateVideosConfig` (e.g. `resolution`, `generateAudio`,
`negativePrompt`) is available through `modelOptions`.
- Responsible-AI filtering is surfaced as a failed job with the filter
reasons.

Note: Veo result URLs are served by the Gemini Files API and require the
Google API key to download (`x-goog-api-key` header or `key` query
parameter).
80 changes: 73 additions & 7 deletions docs/media/video-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@
title: Video Generation
id: video-generation
order: 6
description: "Generate video from text prompts with OpenAI Sora using TanStack AI's experimental generateVideo() jobs/polling API."
description: "Generate video from text prompts with OpenAI Sora or Google Veo using TanStack AI's experimental generateVideo() jobs/polling API."
keywords:
- tanstack ai
- video generation
- sora
- veo
- gemini
- generateVideo
- jobs api
- experimental
Expand Down Expand Up @@ -36,6 +38,7 @@ TanStack AI provides experimental support for video generation through dedicated

Currently supported:
- **OpenAI**: Sora-2 and Sora-2-Pro models (when available)
- **Google Gemini**: Veo 3.1, Veo 3, and Veo 2 models (via the long-running operations API)

## Basic Usage

Expand Down Expand Up @@ -417,9 +420,9 @@ adapter uses to route the input to the provider-specific field:

| Role | Maps to |
| --------------- | ------------------------------------------------------------- |
| `'start_frame'` | fal `start_image_url` (positional default for the first input) |
| `'end_frame'` | fal `end_image_url` (Veo `lastFrame` planned β€” no Veo adapter yet) |
| `'reference'` | fal `reference_image_urls` (Veo `referenceImages` planned) |
| `'start_frame'` | fal `start_image_url`, Veo input `image` (positional default for the first input) |
| `'end_frame'` | fal `end_image_url`, Veo `lastFrame` |
| `'reference'` | fal `reference_image_urls`, Veo `referenceImages` |
| `'character'` | Same as `'reference'` β€” character consistency images |

```typescript
Expand All @@ -445,7 +448,7 @@ await generateVideo({
| ------------ | -------------------------------------------------------------------------------------------------------- |
| **OpenAI** | Sora-2 / Sora-2-Pro β†’ the image part goes to `input_reference`; flattened text is the prompt. Single image only β€” throws if more than one. |
| **fal.ai** | Field names resolve per endpoint from a map generated from the fal SDK's endpoint types β€” e.g. `role: 'start_frame'` lands on `image_url` for Kling/Veo image-to-video, `first_frame_url` for first-last-frame endpoints, and `start_image_url` otherwise. Defaults: single input β†’ `image_url` (start frame); `role: 'end_frame'` β†’ `end_image_url`; `role: 'reference'` / `'character'` β†’ `reference_image_urls`. Override per-endpoint via `modelOptions` β€” the media-conditioning fields are typed optional there (even when the endpoint requires them) since they usually arrive as prompt parts. |
| **Gemini** | Veo adapter not yet implemented β€” image prompt parts will be supported when Veo lands. |
| **Gemini** | Veo β†’ the first un-roled / `'start_frame'` image becomes the input image; `'end_frame'` β†’ `lastFrame`; `'reference'` / `'character'` β†’ `referenceImages` (asset references, Veo 3.1). Throws on multiple starting images. |

Adapters whose underlying API can't accept image inputs throw a clear
runtime error so calls fail fast.
Expand Down Expand Up @@ -488,6 +491,67 @@ const { jobId } = await generateVideo({
})
```

### Google Veo (Gemini) Model Options

Veo runs on Google's long-running operations API. The adapter starts the
operation, and `getVideoJobStatus` polls it until the video is ready:

```typescript
import { generateVideo } from '@tanstack/ai'
import { geminiVideo } from '@tanstack/ai-gemini'

const adapter = geminiVideo('veo-3.1-generate-preview')

const { jobId } = await generateVideo({
adapter,
prompt: 'A close-up of a luthier carving a guitar neck',
size: '16:9', // aspect ratio: '16:9' or '9:16'
duration: 8, // typed per model β€” see below
modelOptions: {
resolution: '1080p', // '720p' (default), '1080p', '4k' (Veo 3.1 only)
negativePrompt: 'cartoon, low quality',
generateAudio: true, // Veo 3+ generates synchronized audio
},
})
```

#### Typed durations

Each Veo model accepts a fixed set of durations, enforced at compile time on
the `duration` option:

| Model | `duration` values (seconds) |
|-------|------------------------------|
| `veo-3.1-generate-preview` | `4`, `6`, `8` |
| `veo-3.1-fast-generate-preview` | `4`, `6`, `8` |
| `veo-3.0-generate-001` | `4`, `6`, `8` |
| `veo-3.0-fast-generate-001` | `4`, `6`, `8` |
| `veo-2.0-generate-001` | `5`, `6`, `8` |

If you have raw seconds (for example from a UI slider), coerce them with
`snapDuration`, or inspect the full set with `availableDurations`:

```typescript
const adapter = geminiVideo('veo-3.0-generate-001')

adapter.availableDurations() // { kind: 'discrete', values: [4, 6, 8] }
adapter.snapDuration(7) // 6 β€” closest valid duration

await generateVideo({
adapter,
prompt: 'A timelapse of a city skyline at dusk',
duration: adapter.snapDuration(7),
})
```

Adapters that haven't declared a per-model duration map keep the plain
`duration?: number` typing, return `{ kind: 'none' }` from
`availableDurations()`, and return `undefined` from `snapDuration()`.

> **Note:** The video URL returned for Veo jobs is served by the Gemini
> Files API and requires your API key to download (send it as an
> `x-goog-api-key` header or `key` query parameter).

## Response Types

> **Note:** The interfaces below are the underlying adapter-level types. The `getVideoJobStatus()` helper returns a single merged object, `{ status, progress?, url?, error?, usage? }` β€” it does not return `jobId` or `expiresAt`.
Expand Down Expand Up @@ -586,9 +650,11 @@ Check the [OpenAI documentation](https://platform.openai.com/docs) for current l

## Environment Variables

The video adapter uses the same environment variable as other OpenAI adapters:
The video adapters use the same environment variables as the other adapters
for their provider:

- `OPENAI_API_KEY`: Your OpenAI API key
- `OPENAI_API_KEY`: Your OpenAI API key (Sora)
- `GOOGLE_API_KEY` or `GEMINI_API_KEY`: Your Google API key (Veo)

## Explicit API Keys

Expand Down
Loading