Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and

[中文版](CHANGELOG.zh.md) · [README](README.md) · [Contributing](CONTRIBUTING.md)

## [1.4.1] - 2026-06-22

### Changed

- Video generation now defaults to the upgraded HappyHorse 1.1 model for better quality. The 1.0 models are still available via `--model`.
- `bl update` now keeps the agent skill in sync across all your agent apps (Claude Code, Cursor, etc.), and refreshes it even when the CLI is already up to date.

## [1.4.0] - 2026-06-17

### Added
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@

[English](CHANGELOG.md) · [README](README.zh.md) · [参与贡献](CONTRIBUTING.zh.md)

## [1.4.1] - 2026-06-22

### 变更

- 视频生成默认升级到 HappyHorse 1.1 模型,画面质量更佳。如需使用 1.0 模型,可通过 `--model` 指定。
- `bl update` 现在会把 agent skill 同步更新到所有 agent 应用(Claude Code、Cursor 等),即使 CLI 已是最新版本也会刷新 skill。

## [1.4.0] - 2026-06-17

### 新增
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Equip your AI Agent out-of-the-box with these capabilities, composable across co
- **Text chat** — Qwen3.7-max: major gains in agentic coding, frontend coding, and vibe coding
- **Multimodal (Omni)** — Full omni-modal support across text + image + audio + video
- **Image generation & editing** — Qwen-Image 2.0: pro text rendering, photorealism, strong semantic adherence, multi-image composition
- **Video generation & editing** — HappyHorse-1.0 series: text-/image-/reference-to-video and natural-language video editing (up to 9-image reference)
- **Video generation & editing** — happyhorse-1.1 series: text-/image-/reference-to-video and natural-language video editing (up to 9-image reference)
- **Speech synthesis & recognition** — CosyVoice streaming TTS, voice cloning from 5–20s samples; FunAudio-ASR covers 30 languages including 7 Chinese dialects and 20+ Mandarin accents
- **Image & video understanding** — Qwen-VL: long-form video analysis, chart/document parsing, visual reasoning, multilingual OCR

Expand All @@ -54,7 +54,7 @@ Equip your AI Agent out-of-the-box with these capabilities, composable across co
A complete **2-minute, 16:9 cinematic short film** — produced end-to-end from a single natural-language sentence, with **zero manual editing**. This showcase demonstrates how an AI Agent can compose a multi-step creative pipeline by orchestrating three primitives:

- **[Qwen Code](https://github.com/QwenLM/qwen-code)** — the agentic coding model that interprets the user's intent and drives the workflow
- **[Aliyun Model Studio CLI](https://bailian.console.aliyun.com/cli?source_channel=cli_github&)** — invokes **HappyHorse 1.0**, Aliyun Model Studio's text-/image-/reference-to-video generation model
- **[Aliyun Model Studio CLI](https://bailian.console.aliyun.com/cli?source_channel=cli_github&)** — invokes **HappyHorse 1.1**, Aliyun Model Studio's text-/image-/reference-to-video generation model
- **[spark-video Skill](https://github.com/JohnKeating1997/spark-video)** — handles scene decomposition, storyboarding, shot continuity, and final stitching

### The single prompt
Expand All @@ -67,7 +67,7 @@ A complete **2-minute, 16:9 cinematic short film** — produced end-to-end from

1. **Qwen Code** parses the request, plans the narrative beats, and decides which tools to call.
2. The **spark-video Skill** breaks the story into shots, writes per-shot prompts, and enforces visual continuity (characters, lighting, palette, lens language).
3. **`bl video generate`** dispatches each shot to **HappyHorse 1.0** in parallel.
3. **`bl video generate`** dispatches each shot to **HappyHorse 1.1** in parallel.
4. The skill stitches all clips back together into a single 16:9 / ~2-min deliverable.

No timeline scrubbing. No frame-by-frame editing. Just one sentence → one video.
Expand Down
6 changes: 3 additions & 3 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
- **文本对话** — Qwen3.7-max:Agentic coding、前端编程、Vibe coding 等能力显著增强
- **全模态对话** — 文本 + 图像 + 音频 + 视频全模态支持
- **图像生成与编辑** — Qwen-Image 2.0:专业文字渲染、真实质感、强语义遵循、多图合成
- **视频生成与编辑** — HappyHorse-1.0 系列,支持文生 / 图生 / 参考生(最多 9 张图参考)/ 自然语言视频编辑
- **视频生成与编辑** — happyhorse-1.1 系列,支持文生 / 图生 / 参考生(最多 9 张图参考)/ 自然语言视频编辑
- **语音合成与识别** — CosyVoice 实时流式合成,5-20s 样本即可克隆;FunAudio-ASR 覆盖 30 种语种,含汉语七大方言与 20+ 口音官话
- **图像与视频理解** — Qwen-VL:长视频解析、复杂图表与文档识别、视觉推理、多语种 OCR

Expand All @@ -54,7 +54,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
一部完整的 **2 分钟、16:9 电影感短片** —— 由一句自然语言端到端生成,**全程零手动剪辑**。这个示例展示了 AI Agent 如何把三个基础能力编排成一条多步创作流水线:

- **[Qwen Code](https://github.com/QwenLM/qwen-code)** —— Agentic coding 模型,解析用户意图、驱动整个工作流
- **[阿里云百炼 CLI](https://github.com/modelstudioai/cli/)** —— 调用 **HappyHorse 1.0**,百炼的文生/图生/参考生视频模型
- **[阿里云百炼 CLI](https://github.com/modelstudioai/cli/)** —— 调用 **HappyHorse 1.1**,百炼的文生/图生/参考生视频模型
- **[spark-video Skill](https://github.com/JohnKeating1997/spark-video)** —— 负责场景拆分、分镜设计、镜头连贯性和最终拼接

### 唯一的提示词
Expand All @@ -65,7 +65,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_

1. **Qwen Code** 解析需求、规划叙事节奏,决定要调用哪些工具。
2. **spark-video Skill** 把故事拆成镜头、为每个镜头写提示词,并保证视觉连贯性(角色、光线、色调、镜头语言)。
3. **`bl video generate`** 把每个镜头并行下发给 **HappyHorse 1.0**。
3. **`bl video generate`** 把每个镜头并行下发给 **HappyHorse 1.1**。
4. Skill 把所有片段拼成最终的 16:9 / 约 2 分钟成片。

没有时间线拖拽,没有逐帧剪辑。一句话 → 一部短片。
Expand Down
6 changes: 3 additions & 3 deletions packages/cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Equip your AI Agent out-of-the-box with these capabilities, composable across co
- **Text chat** — Qwen3.7-max: major gains in agentic coding, frontend coding, and vibe coding
- **Multimodal (Omni)** — Full omni-modal support across text + image + audio + video
- **Image generation & editing** — Qwen-Image 2.0: pro text rendering, photorealism, strong semantic adherence, multi-image composition
- **Video generation & editing** — HappyHorse-1.0 series: text-/image-/reference-to-video and natural-language video editing (up to 9-image reference)
- **Video generation & editing** — happyhorse-1.1 series: text-/image-/reference-to-video and natural-language video editing (up to 9-image reference)
- **Speech synthesis & recognition** — CosyVoice streaming TTS, voice cloning from 5–20s samples; FunAudio-ASR covers 30 languages including 7 Chinese dialects and 20+ Mandarin accents
- **Image & video understanding** — Qwen-VL: long-form video analysis, chart/document parsing, visual reasoning, multilingual OCR

Expand All @@ -54,7 +54,7 @@ Equip your AI Agent out-of-the-box with these capabilities, composable across co
A complete **2-minute, 16:9 cinematic short film** — produced end-to-end from a single natural-language sentence, with **zero manual editing**. This showcase demonstrates how an AI Agent can compose a multi-step creative pipeline by orchestrating three primitives:

- **[Qwen Code](https://github.com/QwenLM/qwen-code)** — the agentic coding model that interprets the user's intent and drives the workflow
- **[Aliyun Model Studio CLI](https://bailian.console.aliyun.com/cli?source_channel=cli_github&)** — invokes **HappyHorse 1.0**, Aliyun Model Studio's text-/image-/reference-to-video generation model
- **[Aliyun Model Studio CLI](https://bailian.console.aliyun.com/cli?source_channel=cli_github&)** — invokes **HappyHorse 1.1**, Aliyun Model Studio's text-/image-/reference-to-video generation model
- **[spark-video Skill](https://github.com/JohnKeating1997/spark-video)** — handles scene decomposition, storyboarding, shot continuity, and final stitching

### The single prompt
Expand All @@ -67,7 +67,7 @@ A complete **2-minute, 16:9 cinematic short film** — produced end-to-end from

1. **Qwen Code** parses the request, plans the narrative beats, and decides which tools to call.
2. The **spark-video Skill** breaks the story into shots, writes per-shot prompts, and enforces visual continuity (characters, lighting, palette, lens language).
3. **`bl video generate`** dispatches each shot to **HappyHorse 1.0** in parallel.
3. **`bl video generate`** dispatches each shot to **HappyHorse 1.1** in parallel.
4. The skill stitches all clips back together into a single 16:9 / ~2-min deliverable.

No timeline scrubbing. No frame-by-frame editing. Just one sentence → one video.
Expand Down
6 changes: 3 additions & 3 deletions packages/cli/README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
- **文本对话** — Qwen3.7-max:Agentic coding、前端编程、Vibe coding 等能力显著增强
- **全模态对话** — 文本 + 图像 + 音频 + 视频全模态支持
- **图像生成与编辑** — Qwen-Image 2.0:专业文字渲染、真实质感、强语义遵循、多图合成
- **视频生成与编辑** — HappyHorse-1.0 系列,支持文生 / 图生 / 参考生(最多 9 张图参考)/ 自然语言视频编辑
- **视频生成与编辑** — happyhorse-1.1 系列,支持文生 / 图生 / 参考生(最多 9 张图参考)/ 自然语言视频编辑
- **语音合成与识别** — CosyVoice 实时流式合成,5-20s 样本即可克隆;FunAudio-ASR 覆盖 30 种语种,含汉语七大方言与 20+ 口音官话
- **图像与视频理解** — Qwen-VL:长视频解析、复杂图表与文档识别、视觉推理、多语种 OCR

Expand All @@ -54,7 +54,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_
一部完整的 **2 分钟、16:9 电影感短片** —— 由一句自然语言端到端生成,**全程零手动剪辑**。这个示例展示了 AI Agent 如何把三个基础能力编排成一条多步创作流水线:

- **[Qwen Code](https://github.com/QwenLM/qwen-code)** —— Agentic coding 模型,解析用户意图、驱动整个工作流
- **[阿里云百炼 CLI](https://github.com/modelstudioai/cli/)** —— 调用 **HappyHorse 1.0**,百炼的文生/图生/参考生视频模型
- **[阿里云百炼 CLI](https://github.com/modelstudioai/cli/)** —— 调用 **HappyHorse 1.1**,百炼的文生/图生/参考生视频模型
- **[spark-video Skill](https://github.com/JohnKeating1997/spark-video)** —— 负责场景拆分、分镜设计、镜头连贯性和最终拼接

### 唯一的提示词
Expand All @@ -65,7 +65,7 @@ _专为 AI Agent 打造,每个命令均可作为结构化工具调用。_

1. **Qwen Code** 解析需求、规划叙事节奏,决定要调用哪些工具。
2. **spark-video Skill** 把故事拆成镜头、为每个镜头写提示词,并保证视觉连贯性(角色、光线、色调、镜头语言)。
3. **`bl video generate`** 把每个镜头并行下发给 **HappyHorse 1.0**。
3. **`bl video generate`** 把每个镜头并行下发给 **HappyHorse 1.1**。
4. Skill 把所有片段拼成最终的 16:9 / 约 2 分钟成片。

没有时间线拖拽,没有逐帧剪辑。一句话 → 一部短片。
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "bailian-cli",
"version": "1.4.0",
"version": "1.4.1",
"description": "CLI for Aliyun Model Studio (DashScope) AI Platform.",
"keywords": [
"agent",
Expand Down
8 changes: 4 additions & 4 deletions packages/cli/src/commands/video/generate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,12 @@ import {
export default defineCommand({
name: "video generate",
description:
"Generate a video from text or image (happyhorse-1.0-t2v / happyhorse-1.0-i2v / wan2.6-t2v)",
"Generate a video from text or image (happyhorse-1.1-t2v / happyhorse-1.1-i2v / wan2.6-t2v)",
usage: "bl video generate --prompt <text> [--image <url>] [flags]",
options: [
{
flag: "--model <model>",
description: "Model ID (default: happyhorse-1.0-t2v, or happyhorse-1.0-i2v with --image)",
description: "Model ID (default: happyhorse-1.1-t2v, or happyhorse-1.1-i2v with --image)",
},
{ flag: "--prompt <text>", description: "Video description", required: true },
{ flag: "--image <url>", description: "Input image URL for image-to-video generation" },
Expand Down Expand Up @@ -98,7 +98,7 @@ export default defineCommand({
const model =
(flags.model as string) ||
config.defaultVideoModel ||
((flags.image as string) ? "happyhorse-1.0-i2v" : "happyhorse-1.0-t2v");
((flags.image as string) ? "happyhorse-1.1-i2v" : "happyhorse-1.1-t2v");
const format = detectOutputFormat(config.output);

const imageUrl = flags.image as string | undefined;
Expand All @@ -118,7 +118,7 @@ export default defineCommand({
input: {
prompt: prompt!,
negative_prompt: (flags.negativePrompt as string) || undefined,
// i2v models (happyhorse-1.0-i2v) require input.media with type 'first_frame'
// i2v models (happyhorse-1.1-i2v) require input.media with type 'first_frame'
...(resolvedImageUrl
? { media: [{ type: "first_frame" as const, url: resolvedImageUrl }] }
: {}),
Expand Down
6 changes: 3 additions & 3 deletions packages/cli/src/commands/video/ref.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,10 @@ import {
export default defineCommand({
name: "video ref",
description:
"Reference-to-video generation (happyhorse-1.0-r2v / wan2.6-r2v): multi-subject, multi-shot with voice",
"Reference-to-video generation (happyhorse-1.1-r2v / wan2.6-r2v): multi-subject, multi-shot with voice",
usage: "bl video ref --prompt <text> --image <url>... [--ref-video <url>...] [flags]",
options: [
{ flag: "--model <model>", description: "Model ID (default: happyhorse-1.0-r2v)" },
{ flag: "--model <model>", description: "Model ID (default: happyhorse-1.1-r2v)" },
{
flag: "--prompt <text>",
description: "Video description with reference markers (image1, video1, etc.)",
Expand Down Expand Up @@ -126,7 +126,7 @@ export default defineCommand({
const imageVoices = (flags.imageVoice as string[] | undefined) || [];
const videoVoices = (flags.videoVoice as string[] | undefined) || [];

const model = (flags.model as string) || "happyhorse-1.0-r2v";
const model = (flags.model as string) || "happyhorse-1.1-r2v";
const format = detectOutputFormat(config.output);

// --- Resolve file URLs (auto-upload local files) ---
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/src/pipeline/steps/bl-api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -391,7 +391,7 @@ export async function videoGenerate(
});
}

const model = input.model || (input.image ? "happyhorse-1.0-i2v" : "happyhorse-1.0-t2v");
const model = input.model || (input.image ? "happyhorse-1.1-i2v" : "happyhorse-1.1-t2v");

let resolvedImageUrl: string | undefined;
if (input.image) {
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/tests/e2e/video-download.e2e.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
"video",
"generate",
"--model",
"happyhorse-1.0-t2v",
"happyhorse-1.1-t2v",
"--duration",
"3",
"--prompt",
Expand Down
8 changes: 4 additions & 4 deletions packages/cli/tests/e2e/video-generate-i2v.e2e.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
"video",
"generate",
"--model",
"happyhorse-1.0-i2v",
"happyhorse-1.1-i2v",
"--image",
"https://example.com/placeholder.png",
"--non-interactive",
Expand All @@ -53,7 +53,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
"generate",
"--dry-run",
"--model",
"happyhorse-1.0-t2v",
"happyhorse-1.1-t2v",
"--prompt",
"干跑无图",
"--non-interactive",
Expand All @@ -68,7 +68,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
expect(data.request?.input?.media).toBeUndefined();
});

test("【happyhorse-1.0-i2v】图片生成视频", async () => {
test("【happyhorse-1.1-i2v】图片生成视频", async () => {
const outDir = makeE2eOutputDir(e2eLabelFromMetaUrl(import.meta.url));
const png = join(outDir, "e2e-gen.png");
const gen = await runCli([
Expand All @@ -95,7 +95,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
"video",
"generate",
"--model",
"happyhorse-1.0-i2v",
"happyhorse-1.1-i2v",
"--image",
imagePath,
"--prompt",
Expand Down
10 changes: 5 additions & 5 deletions packages/cli/tests/e2e/video-generate-t2v.e2e.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
"video",
"generate",
"--model",
"happyhorse-1.0-t2v",
"happyhorse-1.1-t2v",
"--non-interactive",
]);
expect(exitCode).toBe(0);
Expand All @@ -51,7 +51,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
"generate",
"--dry-run",
"--model",
"happyhorse-1.0-t2v",
"happyhorse-1.1-t2v",
"--prompt",
"干跑校验",
"--non-interactive",
Expand All @@ -62,18 +62,18 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
const data = parseStdoutJson<{ request?: { model?: string; input?: { prompt?: string } } }>(
stdout,
);
expect(data.request?.model).toBe("happyhorse-1.0-t2v");
expect(data.request?.model).toBe("happyhorse-1.1-t2v");
expect(data.request?.input?.prompt).toBe("干跑校验");
});

test("【happyhorse-1.0-t2v】文本生成视频", async () => {
test("【happyhorse-1.1-t2v】文本生成视频", async () => {
const outDir = makeE2eOutputDir(e2eLabelFromMetaUrl(import.meta.url));
const { stdout, stderr, exitCode } = await runCli([
...cliTimeoutPrefix(),
"video",
"generate",
"--model",
"happyhorse-1.0-t2v",
"happyhorse-1.1-t2v",
"--prompt",
"夕阳下海面波光,远景静态镜头",
"--download",
Expand Down
8 changes: 4 additions & 4 deletions packages/cli/tests/e2e/video-ref-r2v.e2e.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
"video",
"ref",
"--model",
"happyhorse-1.0-r2v",
"happyhorse-1.1-r2v",
"--image",
"https://example.com/x.png",
"--non-interactive",
Expand All @@ -52,7 +52,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
"video",
"ref",
"--model",
"happyhorse-1.0-r2v",
"happyhorse-1.1-r2v",
"--prompt",
"仅有描述无素材",
"--non-interactive",
Expand All @@ -61,7 +61,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
expect(stderr).toMatch(/--image|ref-video|At least one|required/i);
});

test("【happyhorse-1.0-r2v】视频参考生成", async () => {
test("【happyhorse-1.1-r2v】视频参考生成", async () => {
const outDir = makeE2eOutputDir(e2eLabelFromMetaUrl(import.meta.url));
const gen = await runCli([
"image",
Expand All @@ -88,7 +88,7 @@ describe.skipIf(!isBailianE2EVideoEnabled() || !isDashScopeE2EReady())(
"video",
"ref",
"--model",
"happyhorse-1.0-r2v",
"happyhorse-1.1-r2v",
"--prompt",
"图1在画面中心轻微晃动",
"--image",
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/tests/stress/lib/fixtures.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ export async function ensurePrerequisites(ctx) {
"video",
"generate",
"--model",
"happyhorse-1.0-t2v",
"happyhorse-1.1-t2v",
"--prompt",
"压测前置短视频:海浪与静态远景,无明显人物。",
"--duration",
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/tests/stress/lib/suite-fixtures.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ export async function generateCombinedFixtures({ suiteRoot, cliPackage }) {
"video",
"generate",
"--model",
"happyhorse-1.0-t2v",
"happyhorse-1.1-t2v",
"--prompt",
"压测前置短视频:海浪与静态远景,无明显人物。",
"--duration",
Expand Down
Loading