TubeScrape

Scrape YouTube search results, channels, transcripts, and playlists — no API key needed.

Built on YouTube's internal InnerTube API. Three interfaces: Python SDK, CLI, and REST API.

Features

Search with filters — type, duration, upload date, sort order, features (4K, HDR, live, CC)
Channel browsing — videos, shorts, playlists, and in-channel search
Transcripts — fetch, translate, format (SRT / WebVTT / JSON), save to file
Playlists — full pagination with position tracking and metadata
Async-first — every method has an async variant for concurrent workloads
JSON-ready — every result object has .to_dict() for instant serialization
Proxy rotation — built-in support for residential proxy lists
Three interfaces — Python SDK, CLI with Rich tables, REST API with Swagger docs
Zero config — no API key, no OAuth, no setup
Lightweight — only httpx as a core dependency

Installation

You can also integrate it into an existing project or use it via a CLI.

# Core SDK only
pip install tubescrape

# With CLI (adds click + rich)
pip install "tubescrape[cli]"

# With REST API server (adds fastapi + uvicorn)
pip install "tubescrape[api]"

# Everything
pip install "tubescrape[all]"

Requirements: Python 3.10+

Quick Start

import json
from tubescrape import YouTube

with YouTube() as yt:
    # Search YouTube
    results = yt.search('python tutorial', max_results=5)
    for video in results.videos:
        print(f'{video.title} — {video.duration} — {video.view_count}')

    # Browse a channel
    channel = yt.get_channel_videos('@lexfridman', max_results=10)
    for video in channel.videos:
        print(f'{video.title} ({video.published_text})')

    # Get a transcript and save as subtitles
    transcript = yt.get_transcript('dQw4w9WgXcQ')
    print(transcript.text)
    transcript.save('subtitles.srt')

    # Every result serializes to JSON instantly
    data = results.to_dict()
    print(json.dumps(data, indent=2))

Every method accepts plain IDs, full URLs, or @handles — parsed automatically.

Why tubescrape?

	tubescrape	youtube-transcript-api	pytube / pytubefix	yt-dlp
Search videos	Yes	No	No	Limited
Channel browse	Yes	No	No	Yes
Transcripts	Yes	Yes	No	Yes
Playlists	Yes	No	Yes	Yes
Async support	Yes	No	No	No
Built-in REST API	Yes	No	No	No
CLI tool	Yes	No	No	Yes
Core dependencies	1 (httpx)	1 (requests)	0	Many
API key needed	No	No	No	No

Search

results = yt.search('python tutorial', max_results=5)

for video in results.videos:
    print(f'{video.title} — {video.url}')
    print(f'  {video.view_count} | {video.duration} | {video.published_text}')
    print(f'  Channel: {video.channel} (verified: {video.is_verified})')

Search Filters

All filters can be combined in a single call:

results = yt.search(
    'podcast interview',
    max_results=10,
    type='video',              # video | channel | playlist | movie
    duration='long',           # short (<4m) | medium (4-20m) | long (>20m)
    upload_date='this_month',  # last_hour | today | this_week | this_month | this_year
    sort_by='view_count',      # relevance | upload_date | view_count | rating
    features=['hd', 'subtitles'],  # live | 4k | hd | subtitles | cc | creative_commons | 360 | vr180 | 3d | hdr
)

Channel Browsing

All channel methods accept @handle, channel ID (UC...), or full URL.

# Videos (newest first, with pagination)
videos = yt.get_channel_videos('@lexfridman', max_results=10)
all_videos = yt.get_channel_videos('@lexfridman', max_results=0)  # fetch ALL

# Shorts
shorts = yt.get_channel_shorts('@lexfridman')
for short in shorts.shorts:
    print(f'{short.title} — {short.view_count} — {short.url}')

# Playlists
playlists = yt.get_channel_playlists('@lexfridman')
for pl in playlists.playlists:
    print(f'{pl.title} — {pl.video_count} — {pl.url}')

# Search within a channel
results = yt.search_channel('@lexfridman', 'artificial intelligence', max_results=10)

Transcripts

# Fetch transcript (auto-detects best language)
transcript = yt.get_transcript('dQw4w9WgXcQ')
print(transcript.text)  # full text as a single string

# Choose language (priority order fallback)
transcript = yt.get_transcript('dQw4w9WgXcQ', languages=['de', 'en'])

# Translate to any language
transcript = yt.get_transcript('dQw4w9WgXcQ', translate_to='es')

# Without timestamps (plain text blob)
transcript = yt.get_transcript('dQw4w9WgXcQ', timestamps=False)

# List available languages
languages = yt.list_transcripts('dQw4w9WgXcQ')
for entry in languages:
    print(f'{entry.language} ({entry.language_code}) — {"auto" if entry.is_generated else "manual"}')

Formatting & Saving

transcript = yt.get_transcript('dQw4w9WgXcQ')

# Format as SRT, WebVTT, JSON, or plain text
srt = YouTube.format_transcript(transcript, fmt='srt')
vtt = YouTube.format_transcript(transcript, fmt='vtt')

# Save to file (format auto-detected from extension)
transcript.save('subtitles.srt')
transcript.save('subtitles.vtt')
transcript.save('transcript.json')
transcript.save('transcript.txt')

Playlists

# Accepts playlist ID or full URL
playlist = yt.get_playlist('PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf')

print(f'{playlist.title} by {playlist.channel} — {len(playlist.videos)} videos')

for entry in playlist.videos:
    print(f'#{entry.position}  {entry.title} — {entry.duration}')

Serialization (`.to_dict()`)

Every result object converts to a plain Python dictionary — ready for JSON, databases, or APIs:

import json

# Works on every result type
results.to_dict()       # SearchResult → dict
video.to_dict()         # VideoResult → dict
channel.to_dict()       # BrowseResult → dict
shorts.to_dict()        # ShortsResult → dict
playlists.to_dict()     # ChannelPlaylistsResult → dict
playlist.to_dict()      # PlaylistResult → dict
transcript.to_dict()    # Transcript → dict

# Sparse output: optional fields excluded when empty/default
# is_verified=False → omitted | badges=[] → omitted | None fields → omitted
print(json.dumps(results.to_dict(), indent=2, ensure_ascii=False))

Async Support

Every method has an async variant prefixed with a. Use in FastAPI, Discord bots, or any async application:

import asyncio
from tubescrape import YouTube

async def main():
    async with YouTube() as yt:
        # All methods have async variants
        results = await yt.asearch('python', max_results=5)
        transcript = await yt.aget_transcript('dQw4w9WgXcQ')

        # Run multiple requests concurrently
        r1, r2, r3 = await asyncio.gather(
            yt.asearch('python'),
            yt.asearch('javascript'),
            yt.asearch('rust'),
        )

asyncio.run(main())

Proxy Support

# Single proxy
yt = YouTube(proxy='http://user:pass@proxy.example.com:8080')

# Proxy rotation (round-robin per request)
yt = YouTube(proxies=[
    'http://user:pass@proxy1:8080',
    'http://user:pass@proxy2:8080',
])

# SOCKS5
yt = YouTube(proxy='socks5://user:pass@proxy:1080')

# Custom timeout and retries
yt = YouTube(proxy='http://proxy:8080', timeout=60.0, max_retries=5)

Tip: YouTube blocks cloud IPs aggressively. Use rotating residential proxies (BrightData, SmartProxy, Oxylabs) for production.

CLI

Install with pip install "tubescrape[cli]".

tubescrape search "python tutorial" -n 5
tubescrape search "podcast" --type video --duration long --sort-by view_count
tubescrape search "python" --json                    # JSON output

tubescrape channel @lexfridman                       # videos (default)
tubescrape channel @lexfridman shorts                # shorts
tubescrape channel @lexfridman playlists             # playlists
tubescrape channel @lexfridman search "podcast"      # search within channel

tubescrape playlist PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf

tubescrape transcript dQw4w9WgXcQ                    # plain text
tubescrape transcript dQw4w9WgXcQ --format srt       # SRT subtitles
tubescrape transcript dQw4w9WgXcQ --translate es      # translate
tubescrape transcript dQw4w9WgXcQ --save output.srt  # save to file
tubescrape transcript dQw4w9WgXcQ --list-languages   # available languages

tubescrape --proxy http://user:pass@host:port search "python"  # with proxy
export TUBESCRAPE_PROXY="http://user:pass@host:port"           # env variable

REST API

Install with pip install "tubescrape[api]".

tubescrape serve                          # starts on localhost:8000
tubescrape serve --host 0.0.0.0 --port 3000

Interactive Swagger docs at http://localhost:8000/docs.

Method	Endpoint	Description
GET	`/api/v1/search?q=python`	Search videos
GET	`/api/v1/channel/{id}/videos`	Channel videos
GET	`/api/v1/channel/{id}/shorts`	Channel shorts
GET	`/api/v1/channel/{id}/playlists`	Channel playlists
GET	`/api/v1/channel/{id}/search?q=...`	Search within channel
GET	`/api/v1/playlist/{id}`	Fetch playlist
GET	`/api/v1/transcript/{video_id}`	Fetch transcript
GET	`/api/v1/transcript/{video_id}/languages`	List languages
GET	`/health`	Health check

curl "http://localhost:8000/api/v1/search?q=python+tutorial&max_results=5"
curl "http://localhost:8000/api/v1/transcript/dQw4w9WgXcQ?format=srt&translate_to=es"

Error Handling

All exceptions inherit from YouTubeError:

YouTubeError
├── RequestError
│   ├── RateLimitError          # HTTP 429
│   └── BotDetectedError        # HTTP 403
├── VideoUnavailableError       # private, deleted, region-locked
│   └── AgeRestrictedError
├── TranscriptsDisabledError
├── TranscriptsNotAvailableError
├── TranscriptFetchError
├── TranslationNotAvailableError
├── ChannelNotFoundError
├── PlaylistNotFoundError
├── APIKeyNotFoundError
└── ParsingError

from tubescrape import YouTube, YouTubeError, RateLimitError

try:
    results = yt.search('python')
except RateLimitError:
    print('Rate limited — use a proxy')
except YouTubeError as e:
    print(f'YouTube error: {e}')

Full Documentation

For detailed examples, all field references, and advanced usage, see the Complete Usage Guide.

Warning

This library uses YouTube's undocumented InnerTube API. It may break if YouTube changes their internal API. If it does, please open an issue.

Contributing

git clone https://github.com/zaidkx37/tubescrape.git
cd tubescrape
pip install -e ".[all,dev]"

pytest                    # run tests
ruff check src/           # lint
mypy src/tubescrape/      # type check

License

MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
src/tubescrape		src/tubescrape
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TubeScrape

Features

Installation

Quick Start

Why tubescrape?

Search

Search Filters

Channel Browsing

Transcripts

Formatting & Saving

Playlists

Serialization (`.to_dict()`)

Async Support

Proxy Support

CLI

REST API

Error Handling

Full Documentation

Warning

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TubeScrape

Features

Installation

Quick Start

Why tubescrape?

Search

Search Filters

Channel Browsing

Transcripts

Formatting & Saving

Playlists

Serialization (.to_dict())

Async Support

Proxy Support

CLI

REST API

Error Handling

Full Documentation

Warning

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Serialization (`.to_dict()`)

Packages