Skip to content

Commit 4e025dd

Browse files
committed
feat(video): upgrade happyhorse model from 1.0 to 1.1, video-edit has not been updated and is still 1.0.
- Update default models in bl-api pipeline from happyhorse-1.0 to 1.1 - Replace happyhorse-1.0-t2v/i2v/r2v references with 1.1 versions in commands
1 parent 1ffcbdd commit 4e025dd

19 files changed

Lines changed: 49 additions & 49 deletions

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Equip your AI Agent out-of-the-box with these capabilities, composable across co
2727
- **Text chat** — Qwen3.7-max: major gains in agentic coding, frontend coding, and vibe coding
2828
- **Multimodal (Omni)** — Full omni-modal support across text + image + audio + video
2929
- **Image generation & editing** — Qwen-Image 2.0: pro text rendering, photorealism, strong semantic adherence, multi-image composition
30-
- **Video generation & editing**HappyHorse-1.0 series: text-/image-/reference-to-video and natural-language video editing (up to 9-image reference)
30+
- **Video generation & editing**happyhorse-1.1 series: text-/image-/reference-to-video and natural-language video editing (up to 9-image reference)
3131
- **Speech synthesis & recognition** — CosyVoice streaming TTS, voice cloning from 5–20s samples; FunAudio-ASR covers 30 languages including 7 Chinese dialects and 20+ Mandarin accents
3232
- **Image & video understanding** — Qwen-VL: long-form video analysis, chart/document parsing, visual reasoning, multilingual OCR
3333

@@ -54,7 +54,7 @@ Equip your AI Agent out-of-the-box with these capabilities, composable across co
5454
A complete **2-minute, 16:9 cinematic short film** — produced end-to-end from a single natural-language sentence, with **zero manual editing**. This showcase demonstrates how an AI Agent can compose a multi-step creative pipeline by orchestrating three primitives:
5555

5656
- **[Qwen Code](https://github.com/QwenLM/qwen-code)** — the agentic coding model that interprets the user's intent and drives the workflow
57-
- **[Aliyun Model Studio CLI](https://bailian.console.aliyun.com/cli?source_channel=cli_github&)** — invokes **HappyHorse 1.0**, Aliyun Model Studio's text-/image-/reference-to-video generation model
57+
- **[Aliyun Model Studio CLI](https://bailian.console.aliyun.com/cli?source_channel=cli_github&)** — invokes **HappyHorse 1.1**, Aliyun Model Studio's text-/image-/reference-to-video generation model
5858
- **[spark-video Skill](https://github.com/JohnKeating1997/spark-video)** — handles scene decomposition, storyboarding, shot continuity, and final stitching
5959

6060
### The single prompt
@@ -67,7 +67,7 @@ A complete **2-minute, 16:9 cinematic short film** — produced end-to-end from
6767

6868
1. **Qwen Code** parses the request, plans the narrative beats, and decides which tools to call.
6969
2. The **spark-video Skill** breaks the story into shots, writes per-shot prompts, and enforces visual continuity (characters, lighting, palette, lens language).
70-
3. **`bl video generate`** dispatches each shot to **HappyHorse 1.0** in parallel.
70+
3. **`bl video generate`** dispatches each shot to **HappyHorse 1.1** in parallel.
7171
4. The skill stitches all clips back together into a single 16:9 / ~2-min deliverable.
7272

7373
No timeline scrubbing. No frame-by-frame editing. Just one sentence → one video.

README.zh.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
2727
- **文本对话** — Qwen3.7-max:Agentic coding、前端编程、Vibe coding 等能力显著增强
2828
- **全模态对话** — 文本 + 图像 + 音频 + 视频全模态支持
2929
- **图像生成与编辑** — Qwen-Image 2.0:专业文字渲染、真实质感、强语义遵循、多图合成
30-
- **视频生成与编辑**HappyHorse-1.0 系列,支持文生 / 图生 / 参考生(最多 9 张图参考)/ 自然语言视频编辑
30+
- **视频生成与编辑**happyhorse-1.1 系列,支持文生 / 图生 / 参考生(最多 9 张图参考)/ 自然语言视频编辑
3131
- **语音合成与识别** — CosyVoice 实时流式合成,5-20s 样本即可克隆;FunAudio-ASR 覆盖 30 种语种,含汉语七大方言与 20+ 口音官话
3232
- **图像与视频理解** — Qwen-VL:长视频解析、复杂图表与文档识别、视觉推理、多语种 OCR
3333

@@ -54,7 +54,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
5454
一部完整的 **2 分钟、16:9 电影感短片** —— 由一句自然语言端到端生成,**全程零手动剪辑**。这个示例展示了 AI Agent 如何把三个基础能力编排成一条多步创作流水线:
5555

5656
- **[Qwen Code](https://github.com/QwenLM/qwen-code)** —— Agentic coding 模型,解析用户意图、驱动整个工作流
57-
- **[阿里云百炼 CLI](https://github.com/modelstudioai/cli/)** —— 调用 **HappyHorse 1.0**,百炼的文生/图生/参考生视频模型
57+
- **[阿里云百炼 CLI](https://github.com/modelstudioai/cli/)** —— 调用 **HappyHorse 1.1**,百炼的文生/图生/参考生视频模型
5858
- **[spark-video Skill](https://github.com/JohnKeating1997/spark-video)** —— 负责场景拆分、分镜设计、镜头连贯性和最终拼接
5959

6060
### 唯一的提示词
@@ -65,7 +65,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
6565

6666
1. **Qwen Code** 解析需求、规划叙事节奏,决定要调用哪些工具。
6767
2. **spark-video Skill** 把故事拆成镜头、为每个镜头写提示词,并保证视觉连贯性(角色、光线、色调、镜头语言)。
68-
3. **`bl video generate`** 把每个镜头并行下发给 **HappyHorse 1.0**
68+
3. **`bl video generate`** 把每个镜头并行下发给 **HappyHorse 1.1**
6969
4. Skill 把所有片段拼成最终的 16:9 / 约 2 分钟成片。
7070

7171
没有时间线拖拽,没有逐帧剪辑。一句话 → 一部短片。

packages/cli/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Equip your AI Agent out-of-the-box with these capabilities, composable across co
2727
- **Text chat** — Qwen3.7-max: major gains in agentic coding, frontend coding, and vibe coding
2828
- **Multimodal (Omni)** — Full omni-modal support across text + image + audio + video
2929
- **Image generation & editing** — Qwen-Image 2.0: pro text rendering, photorealism, strong semantic adherence, multi-image composition
30-
- **Video generation & editing**HappyHorse-1.0 series: text-/image-/reference-to-video and natural-language video editing (up to 9-image reference)
30+
- **Video generation & editing**happyhorse-1.1 series: text-/image-/reference-to-video and natural-language video editing (up to 9-image reference)
3131
- **Speech synthesis & recognition** — CosyVoice streaming TTS, voice cloning from 5–20s samples; FunAudio-ASR covers 30 languages including 7 Chinese dialects and 20+ Mandarin accents
3232
- **Image & video understanding** — Qwen-VL: long-form video analysis, chart/document parsing, visual reasoning, multilingual OCR
3333

@@ -54,7 +54,7 @@ Equip your AI Agent out-of-the-box with these capabilities, composable across co
5454
A complete **2-minute, 16:9 cinematic short film** — produced end-to-end from a single natural-language sentence, with **zero manual editing**. This showcase demonstrates how an AI Agent can compose a multi-step creative pipeline by orchestrating three primitives:
5555

5656
- **[Qwen Code](https://github.com/QwenLM/qwen-code)** — the agentic coding model that interprets the user's intent and drives the workflow
57-
- **[Aliyun Model Studio CLI](https://bailian.console.aliyun.com/cli?source_channel=cli_github&)** — invokes **HappyHorse 1.0**, Aliyun Model Studio's text-/image-/reference-to-video generation model
57+
- **[Aliyun Model Studio CLI](https://bailian.console.aliyun.com/cli?source_channel=cli_github&)** — invokes **HappyHorse 1.1**, Aliyun Model Studio's text-/image-/reference-to-video generation model
5858
- **[spark-video Skill](https://github.com/JohnKeating1997/spark-video)** — handles scene decomposition, storyboarding, shot continuity, and final stitching
5959

6060
### The single prompt
@@ -67,7 +67,7 @@ A complete **2-minute, 16:9 cinematic short film** — produced end-to-end from
6767

6868
1. **Qwen Code** parses the request, plans the narrative beats, and decides which tools to call.
6969
2. The **spark-video Skill** breaks the story into shots, writes per-shot prompts, and enforces visual continuity (characters, lighting, palette, lens language).
70-
3. **`bl video generate`** dispatches each shot to **HappyHorse 1.0** in parallel.
70+
3. **`bl video generate`** dispatches each shot to **HappyHorse 1.1** in parallel.
7171
4. The skill stitches all clips back together into a single 16:9 / ~2-min deliverable.
7272

7373
No timeline scrubbing. No frame-by-frame editing. Just one sentence → one video.

packages/cli/README.zh.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
2727
- **文本对话** — Qwen3.7-max:Agentic coding、前端编程、Vibe coding 等能力显著增强
2828
- **全模态对话** — 文本 + 图像 + 音频 + 视频全模态支持
2929
- **图像生成与编辑** — Qwen-Image 2.0:专业文字渲染、真实质感、强语义遵循、多图合成
30-
- **视频生成与编辑**HappyHorse-1.0 系列,支持文生 / 图生 / 参考生(最多 9 张图参考)/ 自然语言视频编辑
30+
- **视频生成与编辑**happyhorse-1.1 系列,支持文生 / 图生 / 参考生(最多 9 张图参考)/ 自然语言视频编辑
3131
- **语音合成与识别** — CosyVoice 实时流式合成,5-20s 样本即可克隆;FunAudio-ASR 覆盖 30 种语种,含汉语七大方言与 20+ 口音官话
3232
- **图像与视频理解** — Qwen-VL:长视频解析、复杂图表与文档识别、视觉推理、多语种 OCR
3333

@@ -54,7 +54,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
5454
一部完整的 **2 分钟、16:9 电影感短片** —— 由一句自然语言端到端生成,**全程零手动剪辑**。这个示例展示了 AI Agent 如何把三个基础能力编排成一条多步创作流水线:
5555

5656
- **[Qwen Code](https://github.com/QwenLM/qwen-code)** —— Agentic coding 模型,解析用户意图、驱动整个工作流
57-
- **[阿里云百炼 CLI](https://github.com/modelstudioai/cli/)** —— 调用 **HappyHorse 1.0**,百炼的文生/图生/参考生视频模型
57+
- **[阿里云百炼 CLI](https://github.com/modelstudioai/cli/)** —— 调用 **HappyHorse 1.1**,百炼的文生/图生/参考生视频模型
5858
- **[spark-video Skill](https://github.com/JohnKeating1997/spark-video)** —— 负责场景拆分、分镜设计、镜头连贯性和最终拼接
5959

6060
### 唯一的提示词
@@ -65,7 +65,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
6565

6666
1. **Qwen Code** 解析需求、规划叙事节奏,决定要调用哪些工具。
6767
2. **spark-video Skill** 把故事拆成镜头、为每个镜头写提示词,并保证视觉连贯性(角色、光线、色调、镜头语言)。
68-
3. **`bl video generate`** 把每个镜头并行下发给 **HappyHorse 1.0**
68+
3. **`bl video generate`** 把每个镜头并行下发给 **HappyHorse 1.1**
6969
4. Skill 把所有片段拼成最终的 16:9 / 约 2 分钟成片。
7070

7171
没有时间线拖拽,没有逐帧剪辑。一句话 → 一部短片。

packages/cli/src/commands/video/generate.ts

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,12 @@ import {
3131
export default defineCommand({
3232
name: "video generate",
3333
description:
34-
"Generate a video from text or image (happyhorse-1.0-t2v / happyhorse-1.0-i2v / wan2.6-t2v)",
34+
"Generate a video from text or image (happyhorse-1.1-t2v / happyhorse-1.1-i2v / wan2.6-t2v)",
3535
usage: "bl video generate --prompt <text> [--image <url>] [flags]",
3636
options: [
3737
{
3838
flag: "--model <model>",
39-
description: "Model ID (default: happyhorse-1.0-t2v, or happyhorse-1.0-i2v with --image)",
39+
description: "Model ID (default: happyhorse-1.1-t2v, or happyhorse-1.1-i2v with --image)",
4040
},
4141
{ flag: "--prompt <text>", description: "Video description", required: true },
4242
{ flag: "--image <url>", description: "Input image URL for image-to-video generation" },
@@ -98,7 +98,7 @@ export default defineCommand({
9898
const model =
9999
(flags.model as string) ||
100100
config.defaultVideoModel ||
101-
((flags.image as string) ? "happyhorse-1.0-i2v" : "happyhorse-1.0-t2v");
101+
((flags.image as string) ? "happyhorse-1.1-i2v" : "happyhorse-1.1-t2v");
102102
const format = detectOutputFormat(config.output);
103103

104104
const imageUrl = flags.image as string | undefined;
@@ -118,7 +118,7 @@ export default defineCommand({
118118
input: {
119119
prompt: prompt!,
120120
negative_prompt: (flags.negativePrompt as string) || undefined,
121-
// i2v models (happyhorse-1.0-i2v) require input.media with type 'first_frame'
121+
// i2v models (happyhorse-1.1-i2v) require input.media with type 'first_frame'
122122
...(resolvedImageUrl
123123
? { media: [{ type: "first_frame" as const, url: resolvedImageUrl }] }
124124
: {}),

packages/cli/src/commands/video/ref.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,10 @@ import {
3030
export default defineCommand({
3131
name: "video ref",
3232
description:
33-
"Reference-to-video generation (happyhorse-1.0-r2v / wan2.6-r2v): multi-subject, multi-shot with voice",
33+
"Reference-to-video generation (happyhorse-1.1-r2v / wan2.6-r2v): multi-subject, multi-shot with voice",
3434
usage: "bl video ref --prompt <text> --image <url>... [--ref-video <url>...] [flags]",
3535
options: [
36-
{ flag: "--model <model>", description: "Model ID (default: happyhorse-1.0-r2v)" },
36+
{ flag: "--model <model>", description: "Model ID (default: happyhorse-1.1-r2v)" },
3737
{
3838
flag: "--prompt <text>",
3939
description: "Video description with reference markers (image1, video1, etc.)",
@@ -126,7 +126,7 @@ export default defineCommand({
126126
const imageVoices = (flags.imageVoice as string[] | undefined) || [];
127127
const videoVoices = (flags.videoVoice as string[] | undefined) || [];
128128

129-
const model = (flags.model as string) || "happyhorse-1.0-r2v";
129+
const model = (flags.model as string) || "happyhorse-1.1-r2v";
130130
const format = detectOutputFormat(config.output);
131131

132132
// --- Resolve file URLs (auto-upload local files) ---

packages/cli/src/pipeline/steps/bl-api.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -391,7 +391,7 @@ export async function videoGenerate(
391391
});
392392
}
393393

394-
const model = input.model || (input.image ? "happyhorse-1.0-i2v" : "happyhorse-1.0-t2v");
394+
const model = input.model || (input.image ? "happyhorse-1.1-i2v" : "happyhorse-1.1-t2v");
395395

396396
let resolvedImageUrl: string | undefined;
397397
if (input.image) {

packages/cli/tests/e2e/video-download.e2e.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
9191
"video",
9292
"generate",
9393
"--model",
94-
"happyhorse-1.0-t2v",
94+
"happyhorse-1.1-t2v",
9595
"--duration",
9696
"3",
9797
"--prompt",

packages/cli/tests/e2e/video-generate-i2v.e2e.test.ts

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
3737
"video",
3838
"generate",
3939
"--model",
40-
"happyhorse-1.0-i2v",
40+
"happyhorse-1.1-i2v",
4141
"--image",
4242
"https://example.com/placeholder.png",
4343
"--non-interactive",
@@ -53,7 +53,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
5353
"generate",
5454
"--dry-run",
5555
"--model",
56-
"happyhorse-1.0-t2v",
56+
"happyhorse-1.1-t2v",
5757
"--prompt",
5858
"干跑无图",
5959
"--non-interactive",
@@ -68,7 +68,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
6868
expect(data.request?.input?.media).toBeUndefined();
6969
});
7070

71-
test("【happyhorse-1.0-i2v】图片生成视频", async () => {
71+
test("【happyhorse-1.1-i2v】图片生成视频", async () => {
7272
const outDir = makeE2eOutputDir(e2eLabelFromMetaUrl(import.meta.url));
7373
const png = join(outDir, "e2e-gen.png");
7474
const gen = await runCli([
@@ -95,7 +95,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
9595
"video",
9696
"generate",
9797
"--model",
98-
"happyhorse-1.0-i2v",
98+
"happyhorse-1.1-i2v",
9999
"--image",
100100
imagePath,
101101
"--prompt",

packages/cli/tests/e2e/video-generate-t2v.e2e.test.ts

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
3737
"video",
3838
"generate",
3939
"--model",
40-
"happyhorse-1.0-t2v",
40+
"happyhorse-1.1-t2v",
4141
"--non-interactive",
4242
]);
4343
expect(exitCode).toBe(0);
@@ -51,7 +51,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
5151
"generate",
5252
"--dry-run",
5353
"--model",
54-
"happyhorse-1.0-t2v",
54+
"happyhorse-1.1-t2v",
5555
"--prompt",
5656
"干跑校验",
5757
"--non-interactive",
@@ -62,18 +62,18 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
6262
const data = parseStdoutJson<{ request?: { model?: string; input?: { prompt?: string } } }>(
6363
stdout,
6464
);
65-
expect(data.request?.model).toBe("happyhorse-1.0-t2v");
65+
expect(data.request?.model).toBe("happyhorse-1.1-t2v");
6666
expect(data.request?.input?.prompt).toBe("干跑校验");
6767
});
6868

69-
test("【happyhorse-1.0-t2v】文本生成视频", async () => {
69+
test("【happyhorse-1.1-t2v】文本生成视频", async () => {
7070
const outDir = makeE2eOutputDir(e2eLabelFromMetaUrl(import.meta.url));
7171
const { stdout, stderr, exitCode } = await runCli([
7272
...cliTimeoutPrefix(),
7373
"video",
7474
"generate",
7575
"--model",
76-
"happyhorse-1.0-t2v",
76+
"happyhorse-1.1-t2v",
7777
"--prompt",
7878
"夕阳下海面波光,远景静态镜头",
7979
"--download",

0 commit comments

Comments
 (0)