Add video document type (10_video) with Whisper transcription and OpenCV frame extraction by Copilot · Pull Request #2 · fnusatvik07/rag-source

Copilot · 2026-03-07T02:36:02Z

Adds video as the 10th document type in the parsing guide, enabling text extraction from video files for RAG pipelines via two methods: audio transcription and keyframe extraction.

New: `unstructured_documents/10_video/`

01_whisper_transcription.py — Extracts audio via ffmpeg, transcribes with OpenAI Whisper, returns timestamped segments suitable for time-indexed RAG chunks
02_frame_extraction.py — Interval-based and scene-change keyframe extraction via OpenCV, with build_frame_descriptions() for generating embeddable text from visual content
sample_docs/generate_samples.py — Generates lecture.mp4 (10-frame ML lecture slides) and short_clip.mp4 (3-frame quick test) using OpenCV

Both scripts follow existing conventions: lazy imports, availability checks with install instructions, graceful degradation, shared chunking integration, and demo functions.

Updated

pyproject.toml — video optional dependency group (openai-whisper, opencv-python-headless)
README.md / unstructured_documents/README.md — Document type tables, counts, decision matrix, quick start, repository structure

Usage

uv sync --extra video
uv run python unstructured_documents/10_video/sample_docs/generate_samples.py
uv run python unstructured_documents/10_video/02_frame_extraction.py
uv run python unstructured_documents/10_video/01_whisper_transcription.py

from unstructured_documents.shared.chunking import chunk_by_sentences

# Transcribe and chunk for RAG
result = transcribe_video(Path("lecture.mp4"), model_size="base")
chunks = chunk_by_sentences(result["text"], sentences_per_chunk=5)

# Extract keyframes for visual context
keyframes = extract_keyframes(Path("lecture.mp4"), threshold=30.0)
descriptions = build_frame_descriptions(keyframes)

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

… and OpenCV frame extraction Co-authored-by: fnusatvik07 <177245849+fnusatvik07@users.noreply.github.com>

Co-authored-by: fnusatvik07 <177245849+fnusatvik07@users.noreply.github.com>

Initial plan

c942822

Copilot AI assigned Copilot and fnusatvik07 Mar 7, 2026

Copilot started work on behalf of fnusatvik07 March 7, 2026 02:36 View session

Copilot AI and others added 2 commits March 7, 2026 02:45

Add video document type support (10_video) with Whisper transcription…

7efa9cb

… and OpenCV frame extraction Co-authored-by: fnusatvik07 <177245849+fnusatvik07@users.noreply.github.com>

Fix insecure tempfile.mktemp usage - use NamedTemporaryFile instead

f42c5d5

Co-authored-by: fnusatvik07 <177245849+fnusatvik07@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add feature for video upload and chat functionality~~ Add video document type (10_video) with Whisper transcription and OpenCV frame extraction Mar 7, 2026

Copilot finished work on behalf of fnusatvik07 March 7, 2026 02:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add video document type (10_video) with Whisper transcription and OpenCV frame extraction#2

Add video document type (10_video) with Whisper transcription and OpenCV frame extraction#2
Copilot wants to merge 3 commits into
mainfrom
copilot/add-video-upload-chat-feature

Copilot AI commented Mar 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New: unstructured_documents/10_video/

Updated

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 7, 2026 •

edited

Loading

New: `unstructured_documents/10_video/`