trove.

a saving machine for the modern web.

paste a link, get the file. transcribe it. edit the transcript like a doc. all local, no accounts, no upload limits, no telemetry. self-hosted on your machine, powered by yt-dlp, ffmpeg, and whisper.cpp — works on YouTube, TikTok, Instagram, Vimeo, and ~1000 other sites.

what's inside

save — one paste box, MP4/MP3 toggle, quality picker. live progress with %, MB, fragment count, ETA.
bulk paste — drop a wall of URLs, batch them all at once.
pause / resume — pause a download mid-flight; closing the tab auto-pauses (resumes on next visit).
transcribe — local whisper.cpp on every saved file. word-level timestamps. exports .txt / .srt / .vtt.
edit the transcript like a doc — contenteditable paragraphs, click-to-edit words, autosave, paragraph split/merge, find + replace, undo/redo, bookmarks, highlights, notes.
speakers — optional automatic speaker labels via local diarization (no HuggingFace login). rename inline; the rename propagates everywhere.
video stays in view — side-rail with the video, compact player (play/scrub/speed), speakers list, bookmarks. single sticky topbar.
CLI + MCP (both alpha — see notes below) — drive everything from your terminal, or expose Trove as MCP tools to Claude Desktop / Cursor / Replit Agent.
single Python process, single Docker container, no Node. mobile-friendly. light only — riso paper is the brand.


mid-download	saved

quick start

brew install yt-dlp ffmpeg     # macOS — or apt install ffmpeg && pip install yt-dlp
git clone https://github.com/afk1997/trove.git
cd trove
./trove.sh

open http://localhost:8899 and paste something.

or with Docker:

docker build -t trove .
docker run -p 8899:8899 -e HOST=0.0.0.0 -e TROVE_ALLOW_UNAUTH_PUBLIC=1 trove

the -e HOST=0.0.0.0 is required for Docker port-forwarding. trove refuses to start on a non-loopback bind without auth — TROVE_ALLOW_UNAUTH_PUBLIC=1 is the explicit "I know, I'm only exposing this to my host" opt-in. for LAN/internet exposure, drop the opt-in and set TROVE_TOKEN instead — see below.

configuration (env vars)

variable	default	what it does
`HOST`	`127.0.0.1`	bind address. set to `0.0.0.0` only with a token.
`PORT`	`8899`	TCP port.
`TROVE_TOKEN`	(unset)	when set, every `/api/*` request must send `Authorization: Bearer <token>`.
`TROVE_ALLOW_UNAUTH_PUBLIC`	(unset)	set to `1` to allow `HOST=0.0.0.0` with no token (Docker port-forward, trusted LAN). without this opt-in, trove refuses to start on a non-loopback bind unless `TROVE_TOKEN` is set.
`TROVE_COOKIES_FROM_BROWSER`	(unset)	one of `safari\|chrome\|firefox\|brave\|edge`. required for YouTube right now (Google blocks cookieless yt-dlp).
`TROVE_CONCURRENT_FRAGMENTS`	`4`	parallel fragment downloads for HLS streams (YouTube etc.). clamped 1–32.
`TROVE_JOB_TTL_SECONDS`	`3600`	how long completed jobs (and their files) linger before being swept.
`TROVE_MAX_WORKERS`	`4`	concurrent downloads. excess returns HTTP 503.
`TROVE_RATE_LIMIT`	`30`	requests per minute per IP. set to `0` to disable.
`TROVE_BATCH_MAX_URLS`	`50`	hard cap on URLs accepted per `/api/batch-download` request.
`TROVE_DIARIZATION`	`off`	`on` enables speaker labelling on transcribe (requires extra deps — see below).
`TROVE_EXTRACT_AUDIO_TIMEOUT`	`14400`	max seconds for the ffmpeg audio extract step (4 h covers any practical input).

Note on TROVE_TOKEN + tab-close auto-pause: when a token is set, the browser's navigator.sendBeacon cannot attach the Authorization header, so closing the tab mid-download will not POST to /api/job/<id>/pause. The download continues running on the server until it finishes naturally — or, if you stop the server first, it is downgraded to paused on next restart and reappears in the queue. No work is lost either way; only the live "pause indicator" UX is deferred. Local (HOST=127.0.0.1, no token) deployments are unaffected.

exposing to LAN or the internet

the defaults assume localhost only. to expose trove safely:

set a token: export TROVE_TOKEN=$(openssl rand -hex 32)
set host: export HOST=0.0.0.0
run behind a reverse proxy that adds HTTPS (Caddy, nginx, fly.io, etc.).

without TROVE_TOKEN, anyone who can reach the port can download.

YouTube and cookies

cookies are recommended for YouTube. short, public, non-monetized videos often work without them, but YouTube will eventually serve a sign-in wall for age-restricted content, certain regions, or longer/monetized uploads. to use cookies from your browser:

export TROVE_COOKIES_FROM_BROWSER=safari   # or chrome / firefox / brave / edge
./trove.sh

the browser must be installed on the host and have an active YouTube session.

transcription

trove transcribes any saved audio or video locally using whisper.cpp. no api keys, no cloud, no telemetry.

first time:

save a media file (the existing flow)
on the saved card, click ▸ transcribe — you'll see a one-time consent dialog
click set it up ↗ — you'll land on /transcribe/setup
trove auto-detects your machine (Metal on M-series Mac, CUDA on NVIDIA Linux, AVX/CPU otherwise) and shows four model options with realistic speed estimates for your machine
pick one. trove downloads it from huggingface.co/ggerganov/whisper.cpp (one-time, ~140 MB for base)
you're done. transcription works offline forever after.

after first setup:

click ▸ transcribe on any saved card → progress bar → ▸ view transcript ↗ opens the transcript editor in a new tab
check auto-transcribe on the paste form to start transcription automatically when each download finishes

model storage: models live at <trove>/models/ggml-*.bin. swap or remove via the same setup page in settings mode (footer link transcribe settings ↗).

Docker: the model directory is auto-persisted via a Docker volume. To make it visible/mountable on the host, run:

docker run -v ./models:/app/models -v ./downloads:/app/downloads -p 8899:8899 trove

network policy: the only outbound calls trove makes are (1) yt-dlp fetching the original media, and (2) the model download from huggingface during the setup wizard. transcription itself is 100% local.

transcript editor

the transcript page is a real document editor, not a passive viewer.

layout — single sticky topbar (title, saving indicator, undo / redo, search, export, more). document on the left (max 720 px). right rail with video, compact player (play / scrub / time / speed), speakers panel, bookmarks panel.

editing

click any word → place your cursor and type to fix it. all edits autosave.
Enter inside a paragraph splits it at the cursor. Backspace at the start of a paragraph merges it with the one above.
Cmd+Z / Cmd+Shift+Z undo and redo.
right-click a selection for: copy · highlight · bookmark · note · export selection · revert paragraph.

playback + sync

double-click a word → seek the video to that timestamp.
click a [00:14] time pill on a paragraph → seek to that paragraph's start (paused).
Alt + click a word → seek without moving the cursor.
the active word underlines as audio plays. the active paragraph gets a faint highlight. a ↓ jump to current pill appears bottom-center if the active paragraph scrolls offscreen.
Space play/pause, J/L skip ±5 s, ,/. adjust speed, Cmd+B bookmark current time.

word-timestamp realignment — whisper.cpp without DTW places the first word after each silence ~300-500 ms early, and the error compounds. trove uses silero-vad's speech regions as ground truth and snaps drifted single-word timestamps forward to the next speech region's start. word durations are also clamped at 1.5 s so the active-word highlight never lingers across silences. (run TROVE_DIARIZATION=on to enable; the realignment piggy-backs on the same vad pass that diarization uses.)

speakers

with TROVE_DIARIZATION=on, segments come back already labeled Speaker 1, Speaker 2, etc. (see next section).
click any speaker label to rename — the new name propagates to every occurrence.
the speakers panel in the side rail lights up the currently-talking row with an orange dot.
without diarization, segments are split on speech pauses and speakers stay unlabeled; you can apply names manually from the rail.

bookmarks · highlights · notes

bookmarks panel: + add bookmark captures the current playback time. click a time pill to seek. edit the note inline.
select any text and right-click → highlight to mark a passage; → add note to attach a comment to a word.
highlights and notes persist in .words.json and survive transcribe-page reloads.

find + replace

Cmd+F opens the search popover (find tab). Cmd+Shift+F opens find + replace.
replace operates on every visible word; deleted words are skipped. the case toggle controls case-sensitivity.

export

.txt / .srt / .vtt from the export menu. all three are regenerated from the current edits — your fixes ship.
the export selection action on a right-click writes a stand-alone .txt of just the selection with timestamp markers.

autosave detail — every word edit, segment op, speaker rename, bookmark CRUD, etc. goes through a per-transcript txn lock on the server, writes the .words.json atomically (tempfile + os.replace), and regenerates the .txt / .srt / .vtt exports. you can close the tab any time; the document is the source of truth.

speaker diarization

trove can auto-label speakers without any HuggingFace login or API key. The pipeline runs entirely on your machine:

silero-vad finds speech regions in the audio.
resemblyzer computes a 256-d voice embedding for overlapping 1.6 s windows inside each region.
sklearn agglomerative clustering (Ward + Euclidean) groups embeddings into speakers.
a 9-window median filter smooths single-window flips so a brief interjection doesn't create a phantom speaker.
each whisper word is assigned the speaker of the cluster covering its timestamp; consecutive same-speaker words become one paragraph.

Realistic accuracy on clean two-person audio is ~70 %. Sub-1.6 s turns (say one-word interjections) are below resemblyzer's resolution floor and get absorbed into the longer neighbour's chunk — rename or split the resulting paragraph by hand if it matters.

To enable:

pip install resemblyzer silero-vad scikit-learn   # ~800 MB (PyTorch is the bulk)
export TROVE_DIARIZATION=on

Without those deps installed, or with TROVE_DIARIZATION=off (the default), transcription behaves exactly as before — segments split on speech pauses and speakers stay unlabeled.

CLI (alpha)

Status: unstable alpha. trove (the CLI) and trove-mcp (the MCP server below) ship as functional but largely untested. Surfaces and command names may change before they're promoted to stable. Local use only — don't script around them in production yet.

trove talks to a running Trove server through the stable /api/v1 JSON API. Stdlib-only — no extra deps.

# in the trove directory, with the venv activated
python cli.py serve                   # boot the server (alias for ./trove.sh)
python cli.py fetch URL [URL...]      # queue one or many downloads
python cli.py list                    # show jobs (id, status, %, title)
python cli.py get <id>                # full job detail
python cli.py wait <id>               # block until job is done / error / cancelled
python cli.py transcribe <id>         # kick off a transcribe
python cli.py transcript <tid>        # fetch transcript (.txt by default)
python cli.py export <tid> txt|srt|vtt
python cli.py search <tid> "query"    # find inside a transcript
python cli.py replace <tid> "old" "new"
python cli.py models list             # whisper models on disk
python cli.py models pull <name>      # download a model

Configure with env vars:

export TROVE_URL=http://127.0.0.1:8899   # server base URL
export TROVE_TOKEN=...                   # bearer token if the server has one

Run python cli.py --help for the full command list.

MCP server (alpha)

Status: unstable alpha. Same caveats as the CLI. Tested only against the contract; not yet exercised in real agent workflows.

trove-mcp exposes Trove's HTTP surface as Model Context Protocol tools so a coding agent (Claude Desktop, Cursor, Replit Agent, etc.) can drive Trove end-to-end — queue downloads, poll status, transcribe, edit transcripts, search, replace, manage models. Transport: stdio.

Wire it into your MCP client config:

{
  "mcpServers": {
    "trove": {
      "command": "python",
      "args": ["/abs/path/to/trove/mcp_server.py"],
      "env": {
        "TROVE_URL": "http://127.0.0.1:8899",
        "TROVE_TOKEN": ""
      }
    }
  }
}

The Trove server itself must already be running on the URL the MCP client points at. Each tool returns a clear error if the server is unreachable so the agent knows to prompt the user to start it (python cli.py serve).

Tool surface mirrors the CLI 1:1 (intentional — there's a parity check in the test suite that fails if either side adds a tool the other doesn't have).

stack

backend: Python 3.12 + Flask
frontend: htmx 2 + vanilla JS + Tailwind CSS (standalone CLI, no Node at runtime)
engine: yt-dlp + ffmpeg + whisper.cpp (via pywhispercpp)
diarization (optional): silero-vad + resemblyzer + scikit-learn
typography: Fraunces (display, with the WONK + opsz variable axes), Inter (UI), IBM Plex Mono (stamps), Source Serif 4 (transcript body)

disclaimer

this tool is for personal use. respect copyright laws and the terms of service of platforms you download from.

license

MIT. see LICENSE.

inspired by averygan/reclip (MIT).

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.agents/skills		.agents/skills
attached_assets		attached_assets
docs		docs
routes		routes
screenshots		screenshots
static		static
styles		styles
templates		templates
tests		tests
.gitignore		.gitignore
.replit		.replit
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cli.py		cli.py
config.py		config.py
diarizer.py		diarizer.py
jobs.py		jobs.py
jobs_store.py		jobs_store.py
machine.py		machine.py
mcp_server.py		mcp_server.py
models_store.py		models_store.py
pyproject.toml		pyproject.toml
replit.md		replit.md
requirements.txt		requirements.txt
runner.py		runner.py
safety.py		safety.py
skills-lock.json		skills-lock.json
tailwind.config.js		tailwind.config.js
transcribe_jobs.py		transcribe_jobs.py
transcriber.py		transcriber.py
transcript_io.py		transcript_io.py
trove.sh		trove.sh
trove_client.py		trove_client.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

trove.

what's inside

quick start

configuration (env vars)

exposing to LAN or the internet

YouTube and cookies

transcription

transcript editor

speaker diarization

CLI (alpha)

MCP server (alpha)

stack

disclaimer

license

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

trove.

what's inside

quick start

configuration (env vars)

exposing to LAN or the internet

YouTube and cookies

transcription

transcript editor

speaker diarization

CLI (alpha)

MCP server (alpha)

stack

disclaimer

license

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages