Skip to content

Commit 19c4f5f

Browse files
Merge pull request #70 from modelstudioai/feat/switch-model
feat(video): upgrade happyhorse model from 1.0 to 1.1, video-edit has not been updated and is still 1.0.
2 parents 1ffcbdd + af524e5 commit 19c4f5f

23 files changed

Lines changed: 66 additions & 52 deletions

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,13 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and
66

77
[中文版](CHANGELOG.zh.md) · [README](README.md) · [Contributing](CONTRIBUTING.md)
88

9+
## [1.4.1] - 2026-06-22
10+
11+
### Changed
12+
13+
- Video generation now defaults to the upgraded HappyHorse 1.1 model for better quality. The 1.0 models are still available via `--model`.
14+
- `bl update` now keeps the agent skill in sync across all your agent apps (Claude Code, Cursor, etc.), and refreshes it even when the CLI is already up to date.
15+
916
## [1.4.0] - 2026-06-17
1017

1118
### Added

CHANGELOG.zh.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,13 @@
66

77
[English](CHANGELOG.md) · [README](README.zh.md) · [参与贡献](CONTRIBUTING.zh.md)
88

9+
## [1.4.1] - 2026-06-22
10+
11+
### 变更
12+
13+
- 视频生成默认升级到 HappyHorse 1.1 模型,画面质量更佳。如需使用 1.0 模型,可通过 `--model` 指定。
14+
- `bl update` 现在会把 agent skill 同步更新到所有 agent 应用(Claude Code、Cursor 等),即使 CLI 已是最新版本也会刷新 skill。
15+
916
## [1.4.0] - 2026-06-17
1017

1118
### 新增

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Equip your AI Agent out-of-the-box with these capabilities, composable across co
2727
- **Text chat** — Qwen3.7-max: major gains in agentic coding, frontend coding, and vibe coding
2828
- **Multimodal (Omni)** — Full omni-modal support across text + image + audio + video
2929
- **Image generation & editing** — Qwen-Image 2.0: pro text rendering, photorealism, strong semantic adherence, multi-image composition
30-
- **Video generation & editing**HappyHorse-1.0 series: text-/image-/reference-to-video and natural-language video editing (up to 9-image reference)
30+
- **Video generation & editing**happyhorse-1.1 series: text-/image-/reference-to-video and natural-language video editing (up to 9-image reference)
3131
- **Speech synthesis & recognition** — CosyVoice streaming TTS, voice cloning from 5–20s samples; FunAudio-ASR covers 30 languages including 7 Chinese dialects and 20+ Mandarin accents
3232
- **Image & video understanding** — Qwen-VL: long-form video analysis, chart/document parsing, visual reasoning, multilingual OCR
3333

@@ -54,7 +54,7 @@ Equip your AI Agent out-of-the-box with these capabilities, composable across co
5454
A complete **2-minute, 16:9 cinematic short film** — produced end-to-end from a single natural-language sentence, with **zero manual editing**. This showcase demonstrates how an AI Agent can compose a multi-step creative pipeline by orchestrating three primitives:
5555

5656
- **[Qwen Code](https://github.com/QwenLM/qwen-code)** — the agentic coding model that interprets the user's intent and drives the workflow
57-
- **[Aliyun Model Studio CLI](https://bailian.console.aliyun.com/cli?source_channel=cli_github&)** — invokes **HappyHorse 1.0**, Aliyun Model Studio's text-/image-/reference-to-video generation model
57+
- **[Aliyun Model Studio CLI](https://bailian.console.aliyun.com/cli?source_channel=cli_github&)** — invokes **HappyHorse 1.1**, Aliyun Model Studio's text-/image-/reference-to-video generation model
5858
- **[spark-video Skill](https://github.com/JohnKeating1997/spark-video)** — handles scene decomposition, storyboarding, shot continuity, and final stitching
5959

6060
### The single prompt
@@ -67,7 +67,7 @@ A complete **2-minute, 16:9 cinematic short film** — produced end-to-end from
6767

6868
1. **Qwen Code** parses the request, plans the narrative beats, and decides which tools to call.
6969
2. The **spark-video Skill** breaks the story into shots, writes per-shot prompts, and enforces visual continuity (characters, lighting, palette, lens language).
70-
3. **`bl video generate`** dispatches each shot to **HappyHorse 1.0** in parallel.
70+
3. **`bl video generate`** dispatches each shot to **HappyHorse 1.1** in parallel.
7171
4. The skill stitches all clips back together into a single 16:9 / ~2-min deliverable.
7272

7373
No timeline scrubbing. No frame-by-frame editing. Just one sentence → one video.

README.zh.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
2727
- **文本对话** — Qwen3.7-max:Agentic coding、前端编程、Vibe coding 等能力显著增强
2828
- **全模态对话** — 文本 + 图像 + 音频 + 视频全模态支持
2929
- **图像生成与编辑** — Qwen-Image 2.0:专业文字渲染、真实质感、强语义遵循、多图合成
30-
- **视频生成与编辑**HappyHorse-1.0 系列,支持文生 / 图生 / 参考生(最多 9 张图参考)/ 自然语言视频编辑
30+
- **视频生成与编辑**happyhorse-1.1 系列,支持文生 / 图生 / 参考生(最多 9 张图参考)/ 自然语言视频编辑
3131
- **语音合成与识别** — CosyVoice 实时流式合成,5-20s 样本即可克隆;FunAudio-ASR 覆盖 30 种语种,含汉语七大方言与 20+ 口音官话
3232
- **图像与视频理解** — Qwen-VL:长视频解析、复杂图表与文档识别、视觉推理、多语种 OCR
3333

@@ -54,7 +54,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
5454
一部完整的 **2 分钟、16:9 电影感短片** —— 由一句自然语言端到端生成,**全程零手动剪辑**。这个示例展示了 AI Agent 如何把三个基础能力编排成一条多步创作流水线:
5555

5656
- **[Qwen Code](https://github.com/QwenLM/qwen-code)** —— Agentic coding 模型,解析用户意图、驱动整个工作流
57-
- **[阿里云百炼 CLI](https://github.com/modelstudioai/cli/)** —— 调用 **HappyHorse 1.0**,百炼的文生/图生/参考生视频模型
57+
- **[阿里云百炼 CLI](https://github.com/modelstudioai/cli/)** —— 调用 **HappyHorse 1.1**,百炼的文生/图生/参考生视频模型
5858
- **[spark-video Skill](https://github.com/JohnKeating1997/spark-video)** —— 负责场景拆分、分镜设计、镜头连贯性和最终拼接
5959

6060
### 唯一的提示词
@@ -65,7 +65,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
6565

6666
1. **Qwen Code** 解析需求、规划叙事节奏,决定要调用哪些工具。
6767
2. **spark-video Skill** 把故事拆成镜头、为每个镜头写提示词,并保证视觉连贯性(角色、光线、色调、镜头语言)。
68-
3. **`bl video generate`** 把每个镜头并行下发给 **HappyHorse 1.0**
68+
3. **`bl video generate`** 把每个镜头并行下发给 **HappyHorse 1.1**
6969
4. Skill 把所有片段拼成最终的 16:9 / 约 2 分钟成片。
7070

7171
没有时间线拖拽,没有逐帧剪辑。一句话 → 一部短片。

packages/cli/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Equip your AI Agent out-of-the-box with these capabilities, composable across co
2727
- **Text chat** — Qwen3.7-max: major gains in agentic coding, frontend coding, and vibe coding
2828
- **Multimodal (Omni)** — Full omni-modal support across text + image + audio + video
2929
- **Image generation & editing** — Qwen-Image 2.0: pro text rendering, photorealism, strong semantic adherence, multi-image composition
30-
- **Video generation & editing**HappyHorse-1.0 series: text-/image-/reference-to-video and natural-language video editing (up to 9-image reference)
30+
- **Video generation & editing**happyhorse-1.1 series: text-/image-/reference-to-video and natural-language video editing (up to 9-image reference)
3131
- **Speech synthesis & recognition** — CosyVoice streaming TTS, voice cloning from 5–20s samples; FunAudio-ASR covers 30 languages including 7 Chinese dialects and 20+ Mandarin accents
3232
- **Image & video understanding** — Qwen-VL: long-form video analysis, chart/document parsing, visual reasoning, multilingual OCR
3333

@@ -54,7 +54,7 @@ Equip your AI Agent out-of-the-box with these capabilities, composable across co
5454
A complete **2-minute, 16:9 cinematic short film** — produced end-to-end from a single natural-language sentence, with **zero manual editing**. This showcase demonstrates how an AI Agent can compose a multi-step creative pipeline by orchestrating three primitives:
5555

5656
- **[Qwen Code](https://github.com/QwenLM/qwen-code)** — the agentic coding model that interprets the user's intent and drives the workflow
57-
- **[Aliyun Model Studio CLI](https://bailian.console.aliyun.com/cli?source_channel=cli_github&)** — invokes **HappyHorse 1.0**, Aliyun Model Studio's text-/image-/reference-to-video generation model
57+
- **[Aliyun Model Studio CLI](https://bailian.console.aliyun.com/cli?source_channel=cli_github&)** — invokes **HappyHorse 1.1**, Aliyun Model Studio's text-/image-/reference-to-video generation model
5858
- **[spark-video Skill](https://github.com/JohnKeating1997/spark-video)** — handles scene decomposition, storyboarding, shot continuity, and final stitching
5959

6060
### The single prompt
@@ -67,7 +67,7 @@ A complete **2-minute, 16:9 cinematic short film** — produced end-to-end from
6767

6868
1. **Qwen Code** parses the request, plans the narrative beats, and decides which tools to call.
6969
2. The **spark-video Skill** breaks the story into shots, writes per-shot prompts, and enforces visual continuity (characters, lighting, palette, lens language).
70-
3. **`bl video generate`** dispatches each shot to **HappyHorse 1.0** in parallel.
70+
3. **`bl video generate`** dispatches each shot to **HappyHorse 1.1** in parallel.
7171
4. The skill stitches all clips back together into a single 16:9 / ~2-min deliverable.
7272

7373
No timeline scrubbing. No frame-by-frame editing. Just one sentence → one video.

packages/cli/README.zh.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
2727
- **文本对话** — Qwen3.7-max:Agentic coding、前端编程、Vibe coding 等能力显著增强
2828
- **全模态对话** — 文本 + 图像 + 音频 + 视频全模态支持
2929
- **图像生成与编辑** — Qwen-Image 2.0:专业文字渲染、真实质感、强语义遵循、多图合成
30-
- **视频生成与编辑**HappyHorse-1.0 系列,支持文生 / 图生 / 参考生(最多 9 张图参考)/ 自然语言视频编辑
30+
- **视频生成与编辑**happyhorse-1.1 系列,支持文生 / 图生 / 参考生(最多 9 张图参考)/ 自然语言视频编辑
3131
- **语音合成与识别** — CosyVoice 实时流式合成,5-20s 样本即可克隆;FunAudio-ASR 覆盖 30 种语种,含汉语七大方言与 20+ 口音官话
3232
- **图像与视频理解** — Qwen-VL:长视频解析、复杂图表与文档识别、视觉推理、多语种 OCR
3333

@@ -54,7 +54,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
5454
一部完整的 **2 分钟、16:9 电影感短片** —— 由一句自然语言端到端生成,**全程零手动剪辑**。这个示例展示了 AI Agent 如何把三个基础能力编排成一条多步创作流水线:
5555

5656
- **[Qwen Code](https://github.com/QwenLM/qwen-code)** —— Agentic coding 模型,解析用户意图、驱动整个工作流
57-
- **[阿里云百炼 CLI](https://github.com/modelstudioai/cli/)** —— 调用 **HappyHorse 1.0**,百炼的文生/图生/参考生视频模型
57+
- **[阿里云百炼 CLI](https://github.com/modelstudioai/cli/)** —— 调用 **HappyHorse 1.1**,百炼的文生/图生/参考生视频模型
5858
- **[spark-video Skill](https://github.com/JohnKeating1997/spark-video)** —— 负责场景拆分、分镜设计、镜头连贯性和最终拼接
5959

6060
### 唯一的提示词
@@ -65,7 +65,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
6565

6666
1. **Qwen Code** 解析需求、规划叙事节奏,决定要调用哪些工具。
6767
2. **spark-video Skill** 把故事拆成镜头、为每个镜头写提示词,并保证视觉连贯性(角色、光线、色调、镜头语言)。
68-
3. **`bl video generate`** 把每个镜头并行下发给 **HappyHorse 1.0**
68+
3. **`bl video generate`** 把每个镜头并行下发给 **HappyHorse 1.1**
6969
4. Skill 把所有片段拼成最终的 16:9 / 约 2 分钟成片。
7070

7171
没有时间线拖拽,没有逐帧剪辑。一句话 → 一部短片。

packages/cli/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "bailian-cli",
3-
"version": "1.4.0",
3+
"version": "1.4.1",
44
"description": "CLI for Aliyun Model Studio (DashScope) AI Platform.",
55
"keywords": [
66
"agent",

packages/cli/src/commands/video/generate.ts

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,12 @@ import {
3131
export default defineCommand({
3232
name: "video generate",
3333
description:
34-
"Generate a video from text or image (happyhorse-1.0-t2v / happyhorse-1.0-i2v / wan2.6-t2v)",
34+
"Generate a video from text or image (happyhorse-1.1-t2v / happyhorse-1.1-i2v / wan2.6-t2v)",
3535
usage: "bl video generate --prompt <text> [--image <url>] [flags]",
3636
options: [
3737
{
3838
flag: "--model <model>",
39-
description: "Model ID (default: happyhorse-1.0-t2v, or happyhorse-1.0-i2v with --image)",
39+
description: "Model ID (default: happyhorse-1.1-t2v, or happyhorse-1.1-i2v with --image)",
4040
},
4141
{ flag: "--prompt <text>", description: "Video description", required: true },
4242
{ flag: "--image <url>", description: "Input image URL for image-to-video generation" },
@@ -98,7 +98,7 @@ export default defineCommand({
9898
const model =
9999
(flags.model as string) ||
100100
config.defaultVideoModel ||
101-
((flags.image as string) ? "happyhorse-1.0-i2v" : "happyhorse-1.0-t2v");
101+
((flags.image as string) ? "happyhorse-1.1-i2v" : "happyhorse-1.1-t2v");
102102
const format = detectOutputFormat(config.output);
103103

104104
const imageUrl = flags.image as string | undefined;
@@ -118,7 +118,7 @@ export default defineCommand({
118118
input: {
119119
prompt: prompt!,
120120
negative_prompt: (flags.negativePrompt as string) || undefined,
121-
// i2v models (happyhorse-1.0-i2v) require input.media with type 'first_frame'
121+
// i2v models (happyhorse-1.1-i2v) require input.media with type 'first_frame'
122122
...(resolvedImageUrl
123123
? { media: [{ type: "first_frame" as const, url: resolvedImageUrl }] }
124124
: {}),

packages/cli/src/commands/video/ref.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,10 @@ import {
3030
export default defineCommand({
3131
name: "video ref",
3232
description:
33-
"Reference-to-video generation (happyhorse-1.0-r2v / wan2.6-r2v): multi-subject, multi-shot with voice",
33+
"Reference-to-video generation (happyhorse-1.1-r2v / wan2.6-r2v): multi-subject, multi-shot with voice",
3434
usage: "bl video ref --prompt <text> --image <url>... [--ref-video <url>...] [flags]",
3535
options: [
36-
{ flag: "--model <model>", description: "Model ID (default: happyhorse-1.0-r2v)" },
36+
{ flag: "--model <model>", description: "Model ID (default: happyhorse-1.1-r2v)" },
3737
{
3838
flag: "--prompt <text>",
3939
description: "Video description with reference markers (image1, video1, etc.)",
@@ -126,7 +126,7 @@ export default defineCommand({
126126
const imageVoices = (flags.imageVoice as string[] | undefined) || [];
127127
const videoVoices = (flags.videoVoice as string[] | undefined) || [];
128128

129-
const model = (flags.model as string) || "happyhorse-1.0-r2v";
129+
const model = (flags.model as string) || "happyhorse-1.1-r2v";
130130
const format = detectOutputFormat(config.output);
131131

132132
// --- Resolve file URLs (auto-upload local files) ---

packages/cli/src/pipeline/steps/bl-api.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -391,7 +391,7 @@ export async function videoGenerate(
391391
});
392392
}
393393

394-
const model = input.model || (input.image ? "happyhorse-1.0-i2v" : "happyhorse-1.0-t2v");
394+
const model = input.model || (input.image ? "happyhorse-1.1-i2v" : "happyhorse-1.1-t2v");
395395

396396
let resolvedImageUrl: string | undefined;
397397
if (input.image) {

0 commit comments

Comments
 (0)