Skip to content

Commit a215311

Browse files
committed
Adapt runtime for newly released Qwen 3.5 family models
This update centers the pipeline on Qwen 3.5 local inference with VRAM-aware single-model selection, scene-aware frame processing, and a guided Windows setup/diagnostics UX with CI and tests.
1 parent 48c5aab commit a215311

19 files changed

Lines changed: 3154 additions & 486 deletions

.github/workflows/ci.yml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: ["main"]
6+
pull_request:
7+
8+
jobs:
9+
lint-and-test:
10+
runs-on: ${{ matrix.os }}
11+
strategy:
12+
fail-fast: false
13+
matrix:
14+
os: [ubuntu-latest, windows-latest]
15+
python-version: ["3.11"]
16+
17+
steps:
18+
- name: Check out repository
19+
uses: actions/checkout@v4
20+
21+
- name: Set up Python
22+
uses: actions/setup-python@v5
23+
with:
24+
python-version: ${{ matrix.python-version }}
25+
26+
- name: Install project and dev dependencies
27+
run: python -m pip install -e ".[dev]"
28+
29+
- name: Lint with Ruff
30+
run: python -m ruff check .
31+
32+
- name: Run tests with pytest
33+
run: python -m pytest

PROJECT_STRUCTURE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,11 @@ All source files are in the workspace root. Layering is preserved by filename pr
1111
/cli.py
1212
/core_config.py
1313
/service_media.py
14+
/service_ollama.py
1415
/service_transcribe.py
1516
/service_summarize.py
1617
/service_pipeline.py
18+
/adapter_gui.py
1719
/adapter_api.py
1820
/adapter_rss.py
1921
/adapter_storage.py

README.md

Lines changed: 96 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,83 +1,124 @@
1-
# Video RSS Aggregator
1+
# Video RSS Aggregator (Qwen 3.5 Vision, 4-bit)
22

3-
Intelligent video summarization and RSS feed generation powered by Qwen3 models on NVIDIA CUDA.
3+
This project has been rebuilt around Qwen 3.5 multimodal models and a strict local VRAM budget.
44

5-
- **ASR**: Qwen/Qwen3-ASR-1.7B (via `qwen-asr`)
6-
- **Summarization**: Qwen/Qwen3-8B-AWQ (via vLLM)
7-
- **Storage**: PostgreSQL
8-
- **API**: FastAPI
5+
- Inference backend: Ollama (Windows-native, no WSL required)
6+
- Default models: Qwen 3.5 4-bit (`q4_K_M`) tiers
7+
- Storage: SQLite (`.data/vra.db`)
8+
- API: FastAPI
9+
10+
## Design Goals
11+
12+
- Use Qwen 3.5 vision-capable small models for summarization quality.
13+
- Keep total app VRAM use within `8GB` by default.
14+
- Prefer 4-bit model variants for quality/efficiency balance.
15+
- Keep setup simple for Windows users (no WSL).
16+
- Use scene-aware frame extraction with timeline coverage for better visual context.
917

1018
## Requirements
1119

1220
- Python 3.11+
13-
- NVIDIA GPU with CUDA (Windows / Linux)
14-
- PostgreSQL 15+
15-
- ffmpeg on PATH
21+
- Windows 10/11
22+
- NVIDIA GPU with at least 8GB VRAM
23+
- Ollama installed on Windows: https://ollama.com/download/windows
24+
- `ffmpeg` and `ffprobe` on `PATH`
1625

17-
## Quick Start
26+
## Quick Start (Windows)
1827

1928
```bash
20-
# Create environment
2129
python -m venv .venv
22-
.venv\Scripts\activate # Windows
23-
# source .venv/bin/activate # Linux
24-
25-
# Install
30+
.venv\Scripts\activate
2631
pip install -e .
32+
```
2733

28-
# Configure
29-
set DATABASE_URL=postgresql://user:pass@localhost:5432/video_rss
34+
Run bootstrap (auto-pulls configured models if missing):
3035

31-
# Run
32-
python -m vra serve --bind 0.0.0.0:8080
36+
```bash
37+
python -m vra bootstrap
3338
```
3439

35-
Models are downloaded from Hugging Face automatically on first run.
40+
Start server:
41+
42+
```bash
43+
python -m vra serve --bind 127.0.0.1:8080
44+
```
45+
46+
Then open `http://127.0.0.1:8080/` for the guided installation + configuration GUI.
47+
The setup page includes one-click diagnostics for Python, FFmpeg/FFprobe, yt-dlp, and Ollama reachability.
48+
49+
## 4-bit Model Defaults
50+
51+
Default model priority:
3652

37-
## Project Layout
53+
1. `qwen3.5:4b-q4_K_M`
54+
2. `qwen3.5:2b-q4_K_M`
55+
3. `qwen3.5:0.8b-q8_0` (safety floor when smaller than 2B is needed)
3856

39-
The codebase is organized by layer and naming prefix in the project root:
57+
Each processing job selects one model up front based on configured VRAM budget,
58+
current runtime VRAM usage, and workload size (transcript + frames).
59+
The selected model is pinned for that job; there is no mid-processing model fallback.
4060

41-
- `core_*.py`: core runtime config (`core_config.py`)
42-
- `service_*.py`: media preparation, transcription, summarization, and orchestration
43-
- `adapter_*.py`: FastAPI interface, RSS rendering, and database adapter
44-
- `cli.py`: CLI commands (`serve`, `verify`)
45-
- `vra.py`: module entry for `python -m vra`
61+
## Video Processing Intelligence
62+
63+
- Scene-aware frame candidate extraction (`ffmpeg` scene score) to catch shot changes.
64+
- Uniform timeline sampling fallback/fill to keep temporal coverage when scene cuts are sparse.
65+
- Deduplication by frame content hash before final frame set is sent to the model.
66+
- Model preselection per job using VRAM headroom and estimated per-request overhead.
67+
- SQLite runs in WAL mode with tuned pragmas for better concurrent read/write stability.
68+
69+
## Runtime Commands
70+
71+
```bash
72+
python -m vra bootstrap
73+
python -m vra status
74+
python -m vra verify --source "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
75+
python -m vra benchmark --source "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
76+
```
77+
78+
`benchmark` compares `scene_aware` vs `uniform_only` extraction on the same source,
79+
reports frame uniqueness metrics, and (by default) runs both through summarization to
80+
show latency and output-shape differences.
4681

4782
## Environment Variables
4883

4984
| Variable | Default | Description |
5085
|---|---|---|
51-
| `DATABASE_URL` | *(required)* | PostgreSQL connection string |
52-
| `BIND_ADDRESS` | `0.0.0.0:8080` | HTTP server bind address |
53-
| `API_KEY` | *(none)* | Optional bearer token for auth |
54-
| `VRA_STORAGE_DIR` | `.data` | Local storage for downloads and audio |
55-
| `VRA_ASR_MODEL` | `Qwen/Qwen3-ASR-1.7B` | ASR model name or path |
56-
| `VRA_LLM_MODEL` | `Qwen/Qwen3-8B-AWQ` | Summarization model name or path |
57-
| `VRA_GPU_MEMORY_UTILIZATION` | `0.8` | vLLM GPU memory fraction |
58-
| `VRA_ASR_DEVICE` | `cuda:0` | PyTorch device for ASR |
59-
| `VRA_ASR_MAX_TOKENS` | `4096` | Max tokens for ASR output |
60-
| `VRA_LLM_MAX_TOKENS` | `2048` | Max tokens for summarization |
61-
| `VRA_RSS_TITLE` | `Video RSS Aggregator` | RSS feed title |
62-
| `VRA_RSS_LINK` | `http://localhost:8080/rss` | RSS feed self-link |
63-
| `VRA_RSS_DESCRIPTION` | `Video summaries` | RSS feed description |
86+
| `BIND_ADDRESS` | `127.0.0.1:8080` | API bind address |
87+
| `API_KEY` | *(none)* | Optional bearer/API-key auth |
88+
| `VRA_STORAGE_DIR` | `.data` | Download/frame/subtitle storage |
89+
| `VRA_DATABASE_PATH` | `.data/vra.db` | SQLite database path |
90+
| `VRA_OLLAMA_BASE_URL` | `http://127.0.0.1:11434` | Ollama API base URL |
91+
| `VRA_MODEL_PRIMARY` | `qwen3.5:4b-q4_K_M` | First-choice model |
92+
| `VRA_MODEL_FALLBACK` | `qwen3.5:2b-q4_K_M` | Second-priority model |
93+
| `VRA_MODEL_MIN` | `qwen3.5:0.8b-q8_0` | Lowest-priority model |
94+
| `VRA_AUTO_PULL_MODELS` | `true` | Pull missing models automatically |
95+
| `VRA_VRAM_BUDGET_MB` | `8192` | Max VRAM budget across the app |
96+
| `VRA_MODEL_SIZE_BUDGET_RATIO` | `0.75` | Share of budget for base model weight |
97+
| `VRA_MODEL_SELECTION_RESERVE_MB` | `768` | VRAM safety reserve kept free during model selection |
98+
| `VRA_CONTEXT_TOKENS` | `3072` | Context window per request |
99+
| `VRA_MAX_OUTPUT_TOKENS` | `768` | Summary output cap |
100+
| `VRA_MAX_FRAMES` | `5` | Max sampled frames per source |
101+
| `VRA_FRAME_SCENE_DETECTION` | `true` | Enable scene-aware frame selection |
102+
| `VRA_FRAME_SCENE_THRESHOLD` | `0.28` | Scene change sensitivity (`ffmpeg` scene score threshold) |
103+
| `VRA_FRAME_SCENE_MIN_FRAMES` | `2` | Minimum detected scene frames before blending with uniform sampling |
104+
| `VRA_MAX_TRANSCRIPT_CHARS` | `16000` | Subtitle transcript cap |
105+
| `VRA_RSS_TITLE` | `Video RSS Aggregator` | RSS title |
106+
| `VRA_RSS_LINK` | `http://127.0.0.1:8080/rss` | RSS self-link |
107+
| `VRA_RSS_DESCRIPTION` | `Video summaries` | RSS description |
64108

65109
## API
66110

67-
### `GET /health`
68-
Returns health status.
111+
- `GET /` (GUI setup + configuration workspace)
112+
- `GET /health`
113+
- `GET /setup/config`
114+
- `GET /setup/diagnostics`
115+
- `POST /setup/bootstrap`
116+
- `GET /runtime`
117+
- `POST /ingest`
118+
- `POST /process`
119+
- `GET /rss?limit=20`
69120

70-
### `POST /ingest`
71-
Ingest an RSS/Atom feed. Body: `{"feed_url": "...", "process": true, "max_items": 5}`
121+
## Notes
72122

73-
### `POST /process`
74-
Process a single video/audio source. Body: `{"source_url": "...", "title": "..."}`
75-
76-
### `GET /rss?limit=20`
77-
Returns summarized content as RSS 2.0 XML.
78-
79-
## Verification
80-
81-
```bash
82-
python -m vra verify --feed-url "https://example.com/feed.xml" --source "/path/to/audio.wav"
83-
```
123+
- GUI setup/configuration workspace is available at `/` when the server is running.
124+
- This version is optimized for local, Windows-native operation first.

adapter_api.py

Lines changed: 108 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,18 @@
11
from __future__ import annotations
22

3-
import os
4-
from dataclasses import asdict
3+
import platform
4+
import shutil
5+
import sys
56
from datetime import datetime, timezone
7+
from importlib.util import find_spec
68

79
from fastapi import Depends, FastAPI, Header, HTTPException, Query
8-
from fastapi.responses import Response
10+
from fastapi.responses import HTMLResponse, Response
911
from pydantic import BaseModel
1012

13+
from adapter_gui import render_setup_page
14+
from core_config import Config
15+
from service_media import runtime_dependency_report
1116
from service_pipeline import Pipeline
1217

1318

@@ -22,17 +27,13 @@ class ProcessRequest(BaseModel):
2227
title: str | None = None
2328

2429

25-
def create_app(pipeline: Pipeline, api_key: str | None = None) -> FastAPI:
30+
def create_app(pipeline: Pipeline, config: Config) -> FastAPI:
2631
app = FastAPI(title="Video RSS Aggregator", version="0.1.0")
2732

28-
rss_title = os.environ.get("VRA_RSS_TITLE", "Video RSS Aggregator")
29-
rss_link = os.environ.get("VRA_RSS_LINK", "http://localhost:8080/rss")
30-
rss_desc = os.environ.get("VRA_RSS_DESCRIPTION", "Video summaries")
31-
3233
def _check_auth(
3334
authorization: str | None = Header(None), x_api_key: str | None = Header(None)
3435
):
35-
if api_key is None:
36+
if config.api_key is None:
3637
return
3738
token = None
3839
if authorization:
@@ -41,33 +42,124 @@ def _check_auth(
4142
token = parts[1]
4243
if token is None:
4344
token = x_api_key
44-
if token != api_key:
45+
if token != config.api_key:
4546
raise HTTPException(status_code=401, detail="unauthorized")
4647

4748
@app.get("/health")
4849
async def health():
4950
return {"status": "ok", "timestamp": datetime.now(timezone.utc).isoformat()}
5051

52+
@app.get("/", response_class=HTMLResponse)
53+
async def setup_home():
54+
return render_setup_page(config)
55+
56+
@app.get("/setup/config")
57+
async def setup_config():
58+
return {
59+
"bind_address": f"{config.bind_host}:{config.bind_port}",
60+
"storage_dir": config.storage_dir,
61+
"database_path": config.database_path,
62+
"ollama_base_url": config.ollama_base_url,
63+
"model_priority": list(config.model_priority),
64+
"vram_budget_mb": config.vram_budget_mb,
65+
"model_selection_reserve_mb": config.model_selection_reserve_mb,
66+
"max_frames": config.max_frames,
67+
"frame_scene_detection": config.frame_scene_detection,
68+
"frame_scene_threshold": config.frame_scene_threshold,
69+
"frame_scene_min_frames": config.frame_scene_min_frames,
70+
"api_key_required": config.api_key is not None,
71+
"quick_commands": {
72+
"bootstrap": "python -m vra bootstrap",
73+
"status": "python -m vra status",
74+
"serve": "python -m vra serve --bind 127.0.0.1:8080",
75+
},
76+
}
77+
78+
@app.get("/setup/diagnostics")
79+
async def setup_diagnostics():
80+
media_tools = runtime_dependency_report()
81+
yt_dlp_cmd = shutil.which("yt-dlp")
82+
ytdlp = {
83+
"command": yt_dlp_cmd,
84+
"module_available": find_spec("yt_dlp") is not None,
85+
}
86+
ytdlp["available"] = bool(ytdlp["command"] or ytdlp["module_available"])
87+
88+
ollama: dict[str, object] = {
89+
"base_url": config.ollama_base_url,
90+
"reachable": False,
91+
"version": None,
92+
"models_found": 0,
93+
"error": None,
94+
}
95+
try:
96+
runtime = await pipeline.runtime_status()
97+
ollama["reachable"] = True
98+
ollama["version"] = runtime.get("ollama_version")
99+
local_models = runtime.get("local_models", {})
100+
ollama["models_found"] = len(local_models)
101+
except Exception as exc:
102+
ollama["error"] = str(exc)
103+
104+
ffmpeg_ok = bool(media_tools["ffmpeg"].get("available"))
105+
ffprobe_ok = bool(media_tools["ffprobe"].get("available"))
106+
ytdlp_ok = bool(ytdlp["available"])
107+
ollama_ok = bool(ollama["reachable"])
108+
109+
return {
110+
"platform": {
111+
"system": platform.system(),
112+
"release": platform.release(),
113+
"python_version": sys.version.split()[0],
114+
"python_executable": sys.executable,
115+
},
116+
"dependencies": {
117+
"ffmpeg": media_tools["ffmpeg"],
118+
"ffprobe": media_tools["ffprobe"],
119+
"yt_dlp": ytdlp,
120+
"ollama": ollama,
121+
},
122+
"ready": ffmpeg_ok and ffprobe_ok and ytdlp_ok and ollama_ok,
123+
}
124+
125+
@app.post("/setup/bootstrap")
126+
async def setup_bootstrap(_=Depends(_check_auth)):
127+
return await pipeline.bootstrap_models()
128+
51129
@app.post("/ingest")
52130
async def ingest(req: IngestRequest, _=Depends(_check_auth)):
53131
report = await pipeline.ingest_feed(req.feed_url, req.process, req.max_items)
54-
return asdict(report)
132+
return {
133+
"feed_title": report.feed_title,
134+
"item_count": report.item_count,
135+
"processed_count": report.processed_count,
136+
}
55137

56138
@app.post("/process")
57139
async def process(req: ProcessRequest, _=Depends(_check_auth)):
58140
report = await pipeline.process_source(req.source_url, req.title)
59141
return {
60142
"source_url": report.source_url,
61143
"title": report.title,
62-
"transcription": asdict(report.transcription)
63-
if report.transcription
64-
else None,
65-
"summary": asdict(report.summary) if report.summary else None,
144+
"transcript_chars": report.transcript_chars,
145+
"frame_count": report.frame_count,
146+
"summary": {
147+
"summary": report.summary.summary,
148+
"key_points": report.summary.key_points,
149+
"visual_highlights": report.summary.visual_highlights,
150+
"model_used": report.summary.model_used,
151+
"vram_mb": report.summary.vram_mb,
152+
"error": report.summary.error,
153+
},
66154
}
67155

68156
@app.get("/rss")
69157
async def rss_feed(limit: int = Query(20, ge=1, le=200)):
70-
xml = await pipeline.rss_feed(rss_title, rss_link, rss_desc, limit)
158+
xml = await pipeline.rss_feed(limit)
71159
return Response(content=xml, media_type="application/rss+xml")
72160

161+
@app.get("/runtime")
162+
async def runtime(_=Depends(_check_auth)):
163+
return await pipeline.runtime_status()
164+
73165
return app

0 commit comments

Comments
 (0)