| Built on the Claude Agent SDK, orchestrating Skill + focused Subagent multi-agent collaboration to automatically complete the full pipeline from script creation to video synthesis | Gemini, Volcano Ark (ByteDance), Grok, OpenAI and custom providers. Character design images ensure character consistency; clue tracking maintains prop/scene coherence across shots | Veo 3.1, Seedance, Grok, Sora 2 and custom providers, switchable globally or per project | RPM rate limiting + independent Image/Video concurrency channels, lease-based scheduling, supports checkpoint resume | Web UI for project management, asset preview, version rollback, real-time SSE task tracking, with built-in AI assistant |
graph TD
A["📖 Upload Novel"] --> B["📝 AI Agent Generates Storyboard Script"]
B --> C["👤 Generate Character Design Images"]
B --> D["🔑 Generate Clue Design Images"]
C --> E["🖼️ Generate Storyboard Images"]
D --> E
E --> F["🎬 Generate Video Clips"]
F --> G["🎞️ FFmpeg Final Video Synthesis"]
F --> H["📦 Export Jianying Draft"]
git clone https://github.com/ArcReel/ArcReel.git
cd ArcReel/deploy
cp .env.example .env
docker compose up -d
# Visit http://localhost:1241cd ArcReel/deploy/production
cp .env.example .env # Set POSTGRES_PASSWORD
docker compose up -dAfter first startup, log in with the default account (username admin, password set via AUTH_PASSWORD in .env; if not set, it is auto-generated on first launch and written back to .env), then go to the Settings page (/settings) to complete configuration:
- ArcReel Agent — Configure Anthropic API Key (powers the AI assistant), supports custom Base URL and model
- AI Image/Video Generation — Configure at least one provider's API Key (Gemini / Volcano Ark / Grok / OpenAI), or add a custom provider
📖 For detailed steps, see the Full Getting Started Guide
- Complete Production Pipeline — Novel → Script → Character Design → Storyboard Images → Video Clips → Final Video, one-click orchestration
- Multi-Agent Architecture — Orchestrator Skill detects project state and automatically dispatches focused Subagents; each Subagent completes one task then returns a summary
- Multi-Provider Support — Image/video/text generation supports four built-in providers: Gemini, Volcano Ark, Grok, OpenAI, switchable globally or per project
- Custom Providers — Connect any OpenAI-compatible / Google-compatible API (e.g., Ollama, vLLM, third-party proxies), auto-discovers available models and assigns media types, with feature parity to built-in providers
- Two Content Modes — Narration mode splits segments by reading rhythm; drama/animation mode organizes by scene/dialogue structure
- Progressive Episode Planning — Human-AI collaboration for splitting long novels: peek probe → Agent suggests breakpoints → user confirms → physical split, produce on demand
- Style Reference Images — Upload style images; AI automatically analyzes and applies them uniformly to all image generation, ensuring visual consistency across the project
- Character Consistency — AI first generates character design images; all subsequent storyboards and videos reference that design
- Clue Tracking — Key props and scene elements marked as "clues" maintain visual coherence across shots
- Version History — Each regeneration automatically saves a historical version, supporting one-click rollback
- Multi-Provider Cost Tracking — All image/video/text generation included in cost calculation, billed per provider strategy, with separate statistics by currency
- Cost Estimation — Estimate project/episode/shot costs before generation, with three-level drill-down showing estimated vs. actual cost comparison
- Jianying Draft Export — Export Jianying draft ZIPs by episode, supporting Jianying 5.x / 6+ (Operation Guide)
- Project Import/Export — Package entire project as archive for easy backup and migration
ArcReel supports multiple built-in providers and custom providers through unified ImageBackend / VideoBackend / TextBackend protocols, switchable globally or per project:
| Provider | Available Models | Capabilities | Billing |
|---|---|---|---|
| Gemini (Google) | Nano Banana 2, Nano Banana Pro | Text-to-image, image-to-image (multi-reference) | Resolution lookup table (USD) |
| Volcano Ark (ByteDance) | Seedream 5.0, Seedream 5.0 Lite, Seedream 4.5, Seedream 4.0 | Text-to-image, image-to-image | Per image (CNY) |
| Grok (xAI) | Grok Imagine Image, Grok Imagine Image Pro | Text-to-image, image-to-image | Per image (USD) |
| OpenAI | GPT Image 1.5, GPT Image 1 Mini | Text-to-image, image-to-image (multi-reference) | Per image (USD) |
| Provider | Available Models | Capabilities | Billing |
|---|---|---|---|
| Gemini (Google) | Veo 3.1, Veo 3.1 Fast, Veo 3.1 Lite | Text-to-video, image-to-video, video extension, negative prompts | Resolution × duration lookup table (USD) |
| Volcano Ark (ByteDance) | Seedance 2.0, Seedance 2.0 Fast, Seedance 1.5 Pro | Text-to-video, image-to-video, video extension, audio generation, seed control, offline inference | Per token usage (CNY) |
| Grok (xAI) | Grok Imagine Video | Text-to-video, image-to-video | Per second (USD) |
| OpenAI | Sora 2, Sora 2 Pro | Text-to-video, image-to-video | Per second (USD) |
| Provider | Available Models | Capabilities | Billing |
|---|---|---|---|
| Gemini (Google) | Gemini 3.1 Flash, Gemini 3.1 Flash Lite, Gemini 3 Pro | Text generation, structured output, visual understanding | Per token usage (USD) |
| Volcano Ark (ByteDance) | Doubao Seed series | Text generation, structured output, visual understanding | Per token usage (CNY) |
| Grok (xAI) | Grok 4.20, Grok 4.1 Fast series | Text generation, structured output, visual understanding | Per token usage (USD) |
| OpenAI | GPT-5.4, GPT-5.4 Mini, GPT-5.4 Nano | Text generation, structured output, visual understanding | Per token usage (USD) |
In addition to built-in providers, you can connect any OpenAI-compatible or Google-compatible API:
- Add a custom provider in the settings page with Base URL and API Key
- Automatically calls
/v1/modelsto discover available models, inferring media type (image/video/text) from model names - Feature parity with built-in providers: global/project-level switching, cost tracking, version management
Provider selection priority: project-level settings > global default. When switching providers, common settings (resolution, aspect ratio, audio, etc.) carry over directly; provider-specific parameters are preserved.
Scan the QR code to join the Feishu (Lark) community group for help and latest updates:
ArcReel's AI assistant is built on the Claude Agent SDK, using an Orchestrator Skill + Focused Subagent multi-agent architecture:
flowchart TD
User["User Conversation"] --> Main["Main Agent"]
Main --> MW["manga-workflow<br/>Orchestrator Skill"]
MW -->|"State Detection"| PJ["Read project.json<br/>+ File System"]
MW -->|"dispatch"| SA1["analyze-characters-clues<br/>Global Character/Clue Extraction"]
MW -->|"dispatch"| SA2["split-narration-segments<br/>Narration Mode Segment Splitting"]
MW -->|"dispatch"| SA3["normalize-drama-script<br/>Drama Animation Normalization"]
MW -->|"dispatch"| SA4["create-episode-script<br/>JSON Script Generation"]
MW -->|"dispatch"| SA5["Asset Generation Subagent<br/>Characters/Clues/Storyboards/Video"]
SA1 -->|"Summary"| Main
SA4 -->|"Summary"| Main
Main -->|"Show Results<br/>Await Confirmation"| User
Core Design Principles:
- Orchestrator Skill (manga-workflow) — Has state detection capability, automatically determines the current project phase (character design / episode planning / preprocessing / script generation / asset generation), dispatches the corresponding Subagent, supports entry from any phase and interruption/resume
- Focused Subagent — Each Subagent completes only one task then returns; large context such as the novel source text stays inside the Subagent, while the main Agent only receives a refined summary, protecting context space
- Skill vs. Subagent Boundary — Skills handle deterministic script execution (API calls, file generation); Subagents handle tasks requiring reasoning and analysis (character extraction, script normalization)
- Inter-Phase Confirmation — After each Subagent returns, the main Agent presents a results summary to the user and waits for confirmation before proceeding to the next phase
ArcReel supports calls from external AI Agent platforms such as OpenClaw, enabling natural language-driven video creation:
- Generate an API Key (with
arc-prefix) in ArcReel's settings page - Load ArcReel's Skill definition in OpenClaw (visit
http://your-domain/skill.mdfor automatic retrieval) - Create projects, generate scripts, and produce videos through OpenClaw conversation
Technical implementation: API Key authentication (Bearer Token) + synchronous Agent conversation endpoint (POST /api/v1/agent/chat), internally connecting to the SSE streaming assistant and collecting complete responses.
flowchart TB
subgraph UI["Web UI — React 19"]
U1["Project Management"] ~~~ U2["Asset Preview"] ~~~ U3["AI Assistant"] ~~~ U4["Task Monitor"]
end
subgraph Server["FastAPI Server"]
S1["REST API<br/>Route Dispatch"] ~~~ S2["Agent Runtime<br/>Claude Agent SDK"]
S3["SSE Stream<br/>Real-time Status Push"] ~~~ S4["Auth<br/>JWT + API Key"]
end
subgraph Core["Core Library"]
C1["VideoBackend Abstraction<br/>Gemini · Volcano Ark · Grok · OpenAI · Custom"] ~~~ C2["ImageBackend Abstraction<br/>Gemini · Volcano Ark · Grok · OpenAI · Custom"]
C5["TextBackend Abstraction<br/>Gemini · Volcano Ark · Grok · OpenAI · Custom"] ~~~ C3["GenerationQueue<br/>RPM Limiting · Image/Video Channels"]
C4["ProjectManager<br/>File System + Version Management"]
end
subgraph Data["Data Layer"]
D1["SQLAlchemy 2.0 Async ORM"] ~~~ D2["SQLite / PostgreSQL"]
D3["Alembic Migrations"] ~~~ D4["UsageTracker<br/>Multi-Provider Cost Tracking"]
end
UI --> Server --> Core --> Data
| Layer | Technology |
|---|---|
| Frontend | React 19, TypeScript, Tailwind CSS 4, wouter, zustand, Framer Motion, Vite |
| Backend | FastAPI, Python 3.12+, uvicorn, Pydantic 2 |
| AI Agents | Claude Agent SDK (Skill + Subagent multi-agent architecture) |
| Image Generation | Gemini (google-genai), Volcano Ark (volcengine-python-sdk[ark]), Grok (xai-sdk), OpenAI (openai) |
| Video Generation | Gemini Veo 3.1 (google-genai), Volcano Ark Seedance 2.0/1.5 (volcengine-python-sdk[ark]), Grok (xai-sdk), OpenAI Sora 2 (openai) |
| Text Generation | Gemini (google-genai), Volcano Ark (volcengine-python-sdk[ark]), Grok (xai-sdk), OpenAI (openai), Instructor (structured output fallback) |
| Media Processing | FFmpeg, Pillow |
| ORM & Database | SQLAlchemy 2.0 (async), Alembic, aiosqlite, asyncpg — SQLite (default) / PostgreSQL (production) |
| Authentication | JWT (pyjwt), API Key (SHA-256 hash), Argon2 password hashing (pwdlib) |
| Deployment | Docker, Docker Compose (deploy/ default, deploy/production/ with PostgreSQL) |
- 📖 Full Getting Started Guide — Step-by-step guide from scratch
- 📦 Jianying Draft Export Guide — Import video clips into Jianying desktop for secondary editing
- 💰 Google GenAI Cost Reference — Gemini image / Veo video generation cost reference
- 💰 Volcano Ark Cost Reference — Volcano Ark video / image / text model cost reference
Contributions, bug reports, and feature suggestions are welcome! Please see the Contributing Guide for local development setup, testing, and code standards.
If you find this project useful, please give it a ⭐ Star!

