Skip to content

Add video document type (10_video) with Whisper transcription and OpenCV frame extraction#2

Draft
Copilot wants to merge 3 commits into
mainfrom
copilot/add-video-upload-chat-feature
Draft

Add video document type (10_video) with Whisper transcription and OpenCV frame extraction#2
Copilot wants to merge 3 commits into
mainfrom
copilot/add-video-upload-chat-feature

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 7, 2026

Adds video as the 10th document type in the parsing guide, enabling text extraction from video files for RAG pipelines via two methods: audio transcription and keyframe extraction.

New: unstructured_documents/10_video/

  • 01_whisper_transcription.py — Extracts audio via ffmpeg, transcribes with OpenAI Whisper, returns timestamped segments suitable for time-indexed RAG chunks
  • 02_frame_extraction.py — Interval-based and scene-change keyframe extraction via OpenCV, with build_frame_descriptions() for generating embeddable text from visual content
  • sample_docs/generate_samples.py — Generates lecture.mp4 (10-frame ML lecture slides) and short_clip.mp4 (3-frame quick test) using OpenCV

Both scripts follow existing conventions: lazy imports, availability checks with install instructions, graceful degradation, shared chunking integration, and demo functions.

Updated

  • pyproject.tomlvideo optional dependency group (openai-whisper, opencv-python-headless)
  • README.md / unstructured_documents/README.md — Document type tables, counts, decision matrix, quick start, repository structure

Usage

uv sync --extra video
uv run python unstructured_documents/10_video/sample_docs/generate_samples.py
uv run python unstructured_documents/10_video/02_frame_extraction.py
uv run python unstructured_documents/10_video/01_whisper_transcription.py
from unstructured_documents.shared.chunking import chunk_by_sentences

# Transcribe and chunk for RAG
result = transcribe_video(Path("lecture.mp4"), model_size="base")
chunks = chunk_by_sentences(result["text"], sentences_per_chunk=5)

# Extract keyframes for visual context
keyframes = extract_keyframes(Path("lecture.mp4"), threshold=30.0)
descriptions = build_frame_descriptions(keyframes)

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits March 7, 2026 02:45
… and OpenCV frame extraction

Co-authored-by: fnusatvik07 <177245849+fnusatvik07@users.noreply.github.com>
Co-authored-by: fnusatvik07 <177245849+fnusatvik07@users.noreply.github.com>
Copilot AI changed the title [WIP] Add feature for video upload and chat functionality Add video document type (10_video) with Whisper transcription and OpenCV frame extraction Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants