Skip to content

luoxinlan322-sudo/openclaw-video-note-pipeline

Repository files navigation

OpenClaw Video Note Pipeline

Demo | 实机演示

面向 OpenClaw 视频处理链路的 Windows 桌面工具,窗口录屏只是前端采集与提交辅助。
Windows desktop tool for the OpenClaw video-processing pipeline; window recording is the front-end capture and submission helper.

  • 录制指定窗口视频 / Record video for a selected window
  • 录制目标进程树的系统音频 / Record system audio for the target process tree
  • 支持自动停止检测 / Optional auto-stop detection
  • 停止后合成最终 MP4 / Final MP4 output after muxing
  • 可在录制完成后自动上传到腾讯云并触发 OpenClaw,按课程/会议类型调用 skill 生成并校验笔记 / Can automatically upload the finished recording to Tencent Cloud, trigger OpenClaw, and generate validated course or meeting notes through typed skills

Tech Stack | 技术栈

Area Stack Notes
Desktop UI Python, CustomTkinter Windows desktop app UI and settings dialogs
Desktop packaging PyInstaller, Inno Setup Build packaged app and Windows installer
Video capture Windows Graphics Capture, ffmpeg gdigrab WGC first, fallback to gdigrab
Audio capture WASAPI process loopback Target-process-tree system audio capture
Native helpers C++, Visual Studio Build Tools, Windows SDK wgc_capture_helper and wasapi_capture_helper
Encoding / mux ffmpeg, libx264 Real-time encode and final mux
Auto-stop Python, NumPy Frame diff, audio activity, timing thresholds
Cloud API FastAPI, python-dotenv Receives uploads and dispatches ingest jobs
Speech-to-text Whisper Audio transcription for course and meeting note generation
Frame understanding ffmpeg keyframe sampling, Tesseract OCR Extracts visual text from video frames when useful
OpenClaw integration OpenClaw skills, hook trigger, ingest script, quality validation video-summary, ingest, feishu-doc-delete, note validation and post-processing
Feishu integration Feishu Open Platform APIs Message delivery and doc post-processing
Search enhancement Tavily Optional external search enrichment for notes
Deployment PowerShell, Bash, systemd, SSH/SCP Local deploy wrapper + remote installer

Quick Start | 快速开始

推荐顺序:
Recommended order:

  1. 先准备基础服务器和聊天机器人 / Prepare the server base and chatbot first
  2. 再在本地安装桌面端 / Install the desktop app locally
  3. 再部署 server/ + server-addon/ / Deploy server/ + server-addon/
  4. 如果使用 ssh_tunnel,测试云端提交前先开本地隧道 / If using ssh_tunnel, open the local tunnel before cloud submit

1. Prepare the server base and chatbot | 准备基础服务器和聊天机器人

本仓库的一键部署假设目标服务器已经具备以下基础环境:
The one-click deployment in this repo assumes the target server already has:

  • 一台可通过 SSH 访问的云服务器 / A cloud server reachable via SSH
  • OpenClaw 基础安装完成 / OpenClaw base installed
  • 飞书通道已配置 / Feishu channel configured
  • 当前 Windows 机器可通过 SSH 连接服务器 / SSH access working from this Windows machine

当前部署流程主要面向云服务器;例如腾讯云已经提供了相对低代码的基础部署方式。
The current deployment flow is primarily designed for cloud servers. Tencent Cloud, for example, already provides a relatively low-code base deployment path.

基础文档:
Base setup guides:

先确认 SSH 可用:
Confirm SSH first:

ssh <your-server-host-or-ssh-alias>

然后执行项目部署入口:
Then run the project deploy entry:

powershell -ExecutionPolicy Bypass -File .\server-addon\deploy\deploy_all.ps1

这一步会:
This step will:

  • 上传 server/server-addon/ / Upload server/ and server-addon/
  • 在远端执行标准安装脚本 / Run the standard remote installer
  • 使用保守覆盖策略 / Use conservative overwrite semantics
  • 保留远端已有的无关文件 / Keep unrelated extra files on the remote host
  • 同步受管的 OpenClaw skills 到 ~/.openclaw/workspace/skills / Sync managed OpenClaw skills into ~/.openclaw/workspace/skills
  • 把连接文件下载回本地 / Download the generated connection file back to local paths:
    • config/desktop-connection.json
    • server-addon/deploy/desktop-connection.json

部署细节见:
Deployment details:

2. Install the desktop app locally | 在本地安装桌面端

初始化本地环境:
Initialize the local environment:

powershell -ExecutionPolicy Bypass -File .\scripts\bootstrap.ps1

开发模式运行:
Run in development mode:

powershell -ExecutionPolicy Bypass -File .\scripts\run_dev.ps1

像普通桌面应用一样安装:
Install locally like a normal desktop app:

powershell -ExecutionPolicy Bypass -File .\scripts\install_desktop.ps1

可选:加入开机启动。
Optional startup shortcut:

powershell -ExecutionPolicy Bypass -File .\scripts\install_desktop.ps1 -AddStartup

干净卸载:
Clean uninstall:

powershell -ExecutionPolicy Bypass -File .\scripts\uninstall_desktop.ps1

构建应用:
Build packaged app:

powershell -ExecutionPolicy Bypass -File .\scripts\build.ps1

用 Inno Setup 生成 Windows 安装器:
Build Windows installer (setup.exe) via Inno Setup:

powershell -ExecutionPolicy Bypass -File .\scripts\build_installer.ps1

安装器输出:
Installer output:

  • dist\installer\VideoAssistantDesktopSetup.exe

如果你是普通用户,不打算自己构建,直接到 GitHub Releases 下载已经构建好的桌面端安装包即可。
If you are a normal user and do not want to build locally, just download the prebuilt desktop installer from GitHub Releases.

3. Let the desktop app load cloud connection automatically | 让桌面端自动读取云连接配置

deploy_all.ps1 完成后,直接启动桌面端即可。
After deploy_all.ps1 finishes, start the desktop app.

桌面端会按以下顺序自动寻找 desktop-connection.json
The app auto-discovers desktop-connection.json in this order:

  1. config/desktop-connection.json
  2. desktop-connection.json
  3. server-addon/deploy/desktop-connection.json
  4. ~/Downloads/desktop-connection.json
  5. 打包后 exe 所在目录 / The packaged app executable directory

如果找到合法文件且当前云配置还不完整,会自动导入。
If a valid file is found and cloud settings are still missing, it is imported automatically.

如果自动发现失败,再打开 Cloud Settings
If auto-discovery fails, open Cloud Settings and either:

  • 导入 desktop-connection.json / Import desktop-connection.json
  • 或手动填写 / Or fill the fields manually

4. Open an SSH tunnel when using ssh_tunnel | 使用 ssh_tunnel 时先打开 SSH 隧道

如果生成的连接模式是 ssh_tunnel,请在第二个终端里保持以下命令运行:
If the generated connection mode is ssh_tunnel, keep this running in a second terminal:

ssh -N -L 18000:127.0.0.1:8000 <your-server-host-or-ssh-alias>

如果你的本地转发端口不是 18000,请替换成 desktop-connection.json 里的值。
If your generated local port is not 18000, replace it with the value from desktop-connection.json.

Common Deployment Paths | 常见部署路径

  • ssh_tunnel: 桌面端只能通过 SSH 端口转发访问服务器时选它。
    Choose this when the desktop app can only reach the server through SSH port forwarding.
  • public_http: 服务器已经有可直接访问的域名或公网 IP:port 时选它。
    Choose this when the server already has a directly reachable domain or public IP:port.
  • custom_url: 你的 API 经过了非标准反向代理路径时选它。
    Choose this when your API is published behind a non-standard reverse-proxy path.

Recommended Test Flow | 推荐测试流程

  1. 启动应用 / Start the app
  2. 点击 Refresh Window List / Click Refresh Window List
  3. 选择目标窗口 / Select the target window
  4. 首次测试保持 video_backend: "auto" / Keep video_backend: "auto" for the first test
  5. 点击 Start Recording / Click Start Recording
  6. 在目标窗口中操作并播放音频 / Interact with the target window and play audio inside it
  7. 停止录制并检查 output/ 下生成的 MP4 / Stop recording and inspect the generated MP4 under output/

建议人工检查:
Recommended manual checks:

  1. 被其他窗口遮挡时,视频是否仍正确 / Cover the target window and confirm video is still correct
  2. 其他应用播放声音时,音频是否没有串入 / Play sound from another app and confirm it does not leak in
  3. UI 日志里是否显示实际使用的后端 / Check that the UI log reports the actual backend used
  4. 若启用了云提交,确认桌面端已自动加载连接配置 / If cloud submit is enabled, confirm the app auto-loaded cloud settings
  5. 若使用 ssh_tunnel,确认 SSH 隧道终端仍在运行 / If using ssh_tunnel, confirm the SSH tunnel terminal is still running
  6. 先手动提交一个保留文件,再依赖自动提交 / Submit one retained file manually before relying on auto-submit

Video Backend Configuration | 视频后端配置

config/config.local.yaml 中设置 recording.video_backend
Set recording.video_backend in config/config.local.yaml.

  • auto: 优先 WGC,失败时回退到 gdigrab / Prefer WGC, fall back to gdigrab
  • wgc: 强制使用 Windows Graphics Capture helper / Force Windows Graphics Capture helper
  • gdigrab: 强制使用 ffmpeg gdigrab / Force ffmpeg gdigrab

Current Recording Architecture | 当前录制架构

Video | 视频

  • 首选 wgc_capture_helper 实时输出并编码 / Prefer wgc_capture_helper with real-time output and encoding
  • 回退路径是 ffmpeg gdigrab / Fallback path is ffmpeg gdigrab

Audio | 音频

  • 使用 WASAPI process loopback helper / Uses the WASAPI process loopback helper
  • 目标是只录目标进程树的系统音频 / Targets system audio from the target process tree only

Final output | 最终输出

  • 默认保留合成后的 mp4 / Retains the final muxed mp4 by default
  • 也可保留分离音视频输出,用于调试或云端选择提交对象 / Can also retain separate audio and video outputs for debugging or cloud submit selection

Cloud Processing | 云端处理

桌面端可把录制文件提交到云端 API,再由服务器调用 OpenClaw,最终把结果发回飞书。
The desktop app can submit recordings to the cloud API; the server then calls OpenClaw and finally sends results back to Feishu.

OpenClaw 侧当前不是简单“收个视频就总结”,而是按用户选择的 course / meeting 类型进入不同的技能流程。
On the OpenClaw side, the flow is not a simple one-shot summary. It branches into different skill paths based on the selected course or meeting type.

  • course: 生成分章节、分知识点的课程笔记,要求更详细解释、代码片段和外部资料补充
    course: generates sectioned course notes with detailed explanations, code snippets, and external references
  • meeting: 生成会议纪要,强调议题、决策项、行动项和后续跟进
    meeting: generates meeting minutes focused on topics, decisions, action items, and follow-up
  • Whisper 转写:先把音频转成文本,作为笔记主语料
    Whisper transcription: converts audio into text first and uses it as the primary note source
  • 关键帧 / OCR:在视频画面信息密度高时补抓关键帧并识别画面文字
    Keyframes / OCR: extracts keyframes and reads on-screen text when the visual channel carries important information

当前链路会对笔记做结构和质量检查,例如章节/知识点、解释充分度、代码块覆盖、外部参考补充,以及不同类型模板是否达标。
The current pipeline also performs structural and quality checks, including section coverage, knowledge-point coverage, explanation depth, code-block presence, external reference enrichment, and whether the selected note template is satisfied.

OpenClaw skills | OpenClaw 技能结构

  • ingest
    • 负责接收云端任务、触发 OpenClaw 处理链、等待结果,并做最终门禁
      Receives cloud-side jobs, triggers the OpenClaw processing chain, waits for results, and performs final gating
  • video-summary
    • 负责实际笔记生成:音频提取、Whisper 转写、关键帧 / OCR 补充、按 course / meeting 模板组织内容,并在输出前执行质量校验
      Performs the actual note generation: audio extraction, Whisper transcription, keyframe / OCR enrichment, template-based note generation for course / meeting, and quality validation before output
  • feishu-doc-delete
    • 负责按文档链接或 token 删除飞书云文档,供对话中单独触发
      Deletes Feishu cloud documents by link or token as a standalone skill callable from chat

相关文档:
Related docs:

Project Structure | 项目结构

video-assistant-desktop/
├─ src/app/                      # 桌面端主程序 / desktop application
│  ├─ main.py                    # 应用入口 / app entrypoint
│  ├─ ui/main_window.py          # 主界面、设置弹窗、云端配置 UI / main window, dialogs, cloud settings UI
│  ├─ core/recorder.py           # 录制总编排、停止、封装、提交 / recording orchestration, stop, mux, submit
│  ├─ core/video_capture.py      # 视频后端选择与调用 / video backend selection and process control
│  ├─ core/cloud_client.py       # 云端连接模式、提交与探测 / cloud endpoint modes, submit, probing
│  ├─ core/feishu_client.py      # 飞书相关客户端逻辑 / Feishu-side client logic
│  ├─ core/session.py            # 录制会话循环与状态管理 / recording session loop and state
│  ├─ core/auto_stop.py          # 自动停止规则 / auto-stop rules
│  ├─ core/audio_meter.py        # 音频活动检测 / audio activity sampling
│  ├─ core/window_tools.py       # 窗口枚举、截图采样 / window enumeration and frame sampling
│  └─ assets/                    # 图标等静态资源 / icons and static assets
├─ config/
│  └─ config.example.yaml        # 桌面端配置模板 / desktop config template
├─ tools/
│  ├─ wgc_capture_helper/        # WGC 原生 helper 源码与产物 / WGC native helper source and build output
│  └─ wasapi_capture_helper/     # WASAPI 原生 helper 源码 / WASAPI native helper source
├─ scripts/                      # 本地开发、构建、安装脚本 / local dev, build, install scripts
│  ├─ bootstrap.ps1              # 初始化 Python 环境与依赖 / bootstrap Python env and dependencies
│  ├─ run_dev.ps1                # 开发模式启动 / run app in development mode
│  ├─ build.ps1                  # 构建 PyInstaller 桌面端 / build packaged desktop app
│  ├─ build_installer.ps1        # 构建 Inno Setup 安装器 / build Inno Setup installer
│  ├─ build_wgc_helper.ps1       # 编译 WGC helper / compile WGC helper
│  ├─ build_wasapi_helper.ps1    # 编译 WASAPI helper / compile WASAPI helper
│  ├─ install_desktop.ps1        # 本地创建快捷方式 / create local shortcuts
│  ├─ uninstall_desktop.ps1      # 清理本地安装痕迹 / clean local install traces
│  ├─ generate_app_icon.py       # 生成应用图标资源 / generate app icon assets
│  └─ export_lock.ps1            # 导出依赖锁文件 / export dependency lock file
├─ installer/
│  ├─ VideoAssistantDesktop.iss  # Inno Setup 安装器定义 / Inno Setup installer definition
│  └─ ChineseSimplified.isl      # 中文安装界面语言文件 / Chinese installer language file
├─ server/                       # 云端 API 服务本体 / cloud API service
│  ├─ app.py                     # 上传接口、任务调度、ingest 入口 / upload API, job dispatch, ingest entry
│  ├─ requirements.txt           # 服务端 Python 依赖 / server Python dependencies
│  └─ README.md                  # 服务本体说明与调试方式 / server service notes and debug flow
├─ server-addon/                 # OpenClaw 增量与部署包 / OpenClaw add-ons and deployment bundle
│  ├─ openclaw/skills/           # 下发到 OpenClaw 的技能目录 / skills synced into OpenClaw
│  │  ├─ video-summary/          # 课程/会议笔记技能 / course and meeting note skill
│  │  ├─ ingest/                 # 云端触发与结果等待 skill / cloud trigger and result wait skill
│  │  └─ feishu-doc-delete/      # 删除飞书文档 skill / Feishu document deletion skill
│  └─ deploy/                    # 标准远端部署入口 / standard remote deployment entry
│     ├─ deploy_all.ps1          # 本地上传并触发远端安装 / upload locally and invoke remote installer
│     ├─ install_video_assistant_openclaw.sh
│     │                          # 远端标准安装脚本 / standard remote installer
│     ├─ deploy-inputs.example   # 本地部署输入参考模板 / local deployment input reference
│     └─ desktop-connection.json # 连接文件模板 / desktop connection template
├─ docs/                         # 补充文档 / supplementary docs
│  ├─ cloud-processing-api.md    # 云端 API 说明 / cloud API notes
│  ├─ testing-and-backends.md    # 后端测试与录制说明 / backend testing notes
│  └─ feishu-app-permissions.md  # 飞书应用类型与权限清单 / Feishu app type and permission checklist
├─ requirements/                 # 依赖拆分与锁文件 / requirement sets and lock file
├─ Install Video Assistant Desktop.cmd
│                                # 一键安装入口 / one-click install entry
├─ Uninstall Video Assistant Desktop.cmd
│                                # 一键卸载入口 / one-click uninstall entry
└─ README.md                     # 项目总说明 / project overview

Requirements | 依赖要求

  • Windows
  • Python 3.10 or 3.11
  • ffmpeg
  • Visual Studio Build Tools 2022
  • Windows 11 SDK 22621

bootstrap.ps1 会处理 Python 虚拟环境和依赖安装。
bootstrap.ps1 handles the Python environment and dependency installation.

当前支持的 Python 版本:
Supported Python versions are currently:

  • 3.10
  • 3.11

3.13 目前还不在打包 / 运行验证范围内。
3.13 is currently not supported for packaging/runtime validation.

Output Files | 输出文件

生成文件默认写入 output/
Generated files are written under output/.

录制过程中可能看到这些中间文件:
During recording you may see temporary files such as:

  • *.video.mp4
  • *.audio.wav
  • *.audio.stop
  • *.video.stop

最终交付文件:
The final deliverable is:

  • record_YYYYMMDD_HHMMSS.mp4

Auto-stop | 自动停止

自动停止逻辑仍在 Python 层实现。
Auto-stop remains in the Python layer.

它主要使用:
It uses:

  • 帧差采样 / Sampled frame difference
  • 音频活动 / Audio activity
  • 时间阈值 / Timing thresholds
  • 可选结束信号 / Optional end-signal conditions

主要配置段:
Main config section:

  • auto_stop

主要录制时间参数:
Main recording timing config:

  • recording.check_interval_seconds
  • recording.max_duration_minutes
  • recording.mux_timeout_seconds

Troubleshooting | 常见问题

No ffmpeg executable found

确认 ffmpeg 已安装并在 PATH 中,或按 bootstrap.ps1 的提示完成本地依赖安装。
Ensure ffmpeg is installed and available in PATH, or follow bootstrap.ps1 to finish local dependency setup.

wasapi_capture_helper fails to start

优先检查:

  1. Visual Studio Build Tools 和 Windows SDK 是否完整
  2. scripts/build_wasapi_helper.ps1 是否编译成功
  3. 目标进程是否真的有可 loopback 的系统音频

Check first:

  1. Visual Studio Build Tools and Windows SDK are installed correctly
  2. scripts/build_wasapi_helper.ps1 completed successfully
  3. The target process actually produces loopback-capturable system audio

wgc_capture_helper fails to start

优先检查:

  1. 当前系统是否支持 Windows Graphics Capture
  2. scripts/build_wgc_helper.ps1 是否编译成功
  3. 若失败,先回退 video_backend: "gdigrab" 验证整条录制链路

Check first:

  1. The current system supports Windows Graphics Capture
  2. scripts/build_wgc_helper.ps1 completed successfully
  3. If needed, fall back to video_backend: "gdigrab" to validate the rest of the recording pipeline

License | 许可证

This project is licensed under the MIT License.
本项目采用 MIT License 开源许可。

About

面向 OpenClaw 的课程/会议视频采集、Whisper 转写、关键帧/OCR、云端处理、飞书回传与笔记校验流水线。 / Window capture, Whisper transcription, keyframe/OCR extraction, cloud processing, Feishu delivery, and validated note generation pipeline for OpenClaw-powered course and meeting videos.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors