feat: add WebVTT converter by bjesuiter · Pull Request #14 · Michaelliv/markit

bjesuiter · 2026-05-07T19:52:30Z

Hey there, human here!

I wanted my openclaw to build itself a skill to download YouTube video transcripts, and it did so by downloading a vtt file from YouTube and adding a small python script to convert that to markdown.

I thought it might be a great addition to markit to support vtt to markdown files, with deduplication of rolling subtitles as used by YouTube.

So I vibe coded this PR, if you have any remarks or complaints, send them my way, I'll fix it! :)

Example

Input sample.vtt:

WEBVTT

00:00:00.000 --> 00:00:02.000
Hello world.

00:00:02.000 --> 00:00:04.000
This is a caption test.

Run:

markit sample.vtt -q

Output:

# Transcript

## Text

Hello world. This is a caption test.

## Timestamped Transcript

- [00:00:00.000] Hello world.
- [00:00:02.000] This is a caption test.

YouTube-style rolling captions are deduplicated, so cumulative cue fragments become one readable transcript instead of repeated text.

bjesuiter · 2026-05-07T20:11:46Z

Refinement pass pushed in 46971b9 (refactor: refine WebVTT parsing).

Summary:

Made .vtt extension matching case-insensitive (.VTT now works).
Made WebVTT MIME matching case-insensitive.
Simplified skipped WebVTT block handling with a shared prefix list.
Consolidated cue timestamp-tag stripping into one regex.
Improved HTML entity decoding, including numeric and hex entities such as 🐟.
Added test coverage for the new edge cases.

Verified locally:

bun run check
bun test (123 pass)
bun run build

Michaelliv · 2026-05-24T11:03:32Z

Thanks for the PR! I like the goal of making transcripts easier to read, but I’m not fully convinced this should land in core as-is.

My main concern is that WebVTT is already a plain-text format, and the most important semantic information in captions is the timing. This converter seems to turn it into a nicer transcript, but in doing so it drops or weakens some of that structure:

cue end timestamps are not preserved
cue settings/metadata are discarded
the main text section flattens the transcript
the rolling-caption deduplication may make the text more readable, but it can also change the timing relationship of the original cues

Since users/agents can already read or parse raw VTT on demand with normal text tooling, I’m trying to understand the core value of converting it to Markdown if the conversion is lossy.

Could you explain the intended use case a bit more? In particular, do you see this as:

a readability/transcript extraction feature, where losing some caption structure is acceptable, or
a caption-preserving conversion, where we should keep start/end timestamps and avoid losing timing information?

If we keep this in core, I’d lean toward preserving the caption timing more explicitly, for example, including both start and end timestamps in the timestamped section, and treating any deduped plain transcript as a secondary convenience rather than the canonical output.

bjesuiter · 2026-05-26T15:06:35Z

I'm using this to "archive" youtube videos i find interesting.
So this is not something like "i want to know something specific from the vtt" but more like "I want to store the transcript in my notes, like a blogpost and in case the video goes down for some reason"

I probably could store the vtt, but since my openclaw memory is in Markdown already, i'd like to preserve the coherence.
Also: a clean markdown transcript feels better for me for this use case, similar to:
Word for editing (aka. a "Working format") and PDF for export/archive

Benjamin Jesuiter added 2 commits May 7, 2026 13:00

feat: add WebVTT converter

5942667

refactor: refine WebVTT parsing

46971b9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add WebVTT converter#14

feat: add WebVTT converter#14
bjesuiter wants to merge 2 commits into
Michaelliv:mainfrom
bjesuiter:feat/vtt-support

bjesuiter commented May 7, 2026 •

edited

Loading

Uh oh!

bjesuiter commented May 7, 2026

Uh oh!

Michaelliv commented May 24, 2026 •

edited

Loading

Uh oh!

bjesuiter commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bjesuiter commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example

Uh oh!

bjesuiter commented May 7, 2026

Uh oh!

Michaelliv commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjesuiter commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bjesuiter commented May 7, 2026 •

edited

Loading

Michaelliv commented May 24, 2026 •

edited

Loading