Skip to content

Latest commit

 

History

History
189 lines (146 loc) · 5.56 KB

File metadata and controls

189 lines (146 loc) · 5.56 KB

YouTube Live Monitor - Complete Implementation

🎯 Overview

The YouTube scraper now features complete automation with live monitoring capabilities! No more relying on predefined video lists - the system can now:

  1. Monitor yt.txt for new video links added in real-time
  2. Monitor channels in channels.txt for new uploads automatically
  3. Automatically extract transcripts from discovered videos
  4. Manage service lifecycle with easy-to-use control commands

🚀 Features Implemented

1. Live File Monitoring (youtube_live_monitor.py)

  • Real-time file watching: Uses watchdog to monitor yt.txt for changes
  • Channel monitoring: Periodically checks channels for new uploads
  • Automatic processing: Extracts transcripts from discovered videos
  • Duplicate prevention: Maintains a log of processed URLs
  • Stealth mode: Built-in delays to avoid rate limiting
  • Comprehensive logging: Full activity logs and error tracking

2. Service Management (youtube_control.py)

  • Start/Stop control: Easy service lifecycle management
  • Status monitoring: Get real-time service statistics
  • One-time operations: Process URLs or check channels without continuous monitoring
  • Cross-platform PID management: Works on Windows and Unix-like systems

📁 Configuration Files

yt.txt - Video URLs to Monitor

# YouTube Video URLs to monitor
# Add one URL per line
# Lines starting with # are comments

https://www.youtube.com/watch?v=dQw4w9WgXcQ
https://youtu.be/9bZkp7q19f0

channels.txt - Channels to Monitor for New Uploads

# YouTube Channels to monitor for new uploads
# Add one channel URL per line
# Lines starting with # are comments

https://www.youtube.com/@TED
https://www.youtube.com/c/3blue1brown
https://www.youtube.com/channel/UCJ0-OtVpF0wOKEqT2Z1HEtA

🎮 Usage Examples

Start Continuous Monitoring

# Start the service (runs continuously)
python youtube_control.py start

# Check service status
python youtube_control.py status

# Stop the service
python youtube_control.py stop

# Restart the service
python youtube_control.py restart

One-Time Operations

# Process URLs from yt.txt once
python youtube_control.py process

# Check channels for new uploads once
python youtube_control.py check

Direct Monitoring (Advanced)

# Run the monitor directly
python youtube_live_monitor.py

🔧 How It Works

File Monitoring Flow

  1. File Watcher: Monitors yt.txt using filesystem events
  2. Change Detection: Detects when new URLs are added
  3. URL Processing: Filters out comments and duplicates
  4. Transcript Extraction: Uses existing YouTubeTranscriptExtractor
  5. Logging: Records processed URLs to prevent duplicates

Channel Monitoring Flow

  1. Periodic Checks: Checks channels every 5 minutes (configurable)
  2. Recent Videos: Fetches the last 10 videos from each channel
  3. New Video Detection: Compares against processed URLs log
  4. Automatic Processing: Extracts transcripts from new videos
  5. Stealth Delays: Built-in delays between requests

📊 Service Statistics

The control script provides detailed statistics:

{
  "running": true,
  "videos_processed": 42,
  "channels_monitored": 3,
  "errors": 0,
  "uptime": "2:15:30",
  "processed_urls_count": 42,
  "files": {
    "yt.txt": true,
    "channels.txt": true,
    "processed-urls.txt": true
  }
}

🧪 Testing Coverage

Comprehensive test suite with 24 passing tests covering:

  • ✅ Monitor initialization and configuration
  • ✅ File loading and URL processing
  • ✅ New URL detection and filtering
  • ✅ Channel monitoring and video discovery
  • ✅ File system watching and event handling
  • ✅ Service control and management
  • ✅ Error handling and recovery
  • ✅ Cross-platform compatibility

🛡️ Error Handling & Resilience

  • File not found: Gracefully handles missing configuration files
  • Network errors: Robust error handling for YouTube API issues
  • Process management: Clean PID file management with stale process detection
  • Rate limiting: Built-in delays to avoid YouTube rate limits
  • Logging: Comprehensive error logging and debugging information

🔧 Dependencies Added

# File monitoring
watchdog>=3.0.0

# Async file operations  
aiofiles>=23.0.0

# Process management
psutil>=5.9.0

🎯 Key Benefits

  1. 100% Automated: No manual intervention required once configured
  2. Real-time Processing: Immediate response to new content
  3. Scalable: Can monitor unlimited URLs and channels
  4. Reliable: Comprehensive error handling and recovery
  5. Cross-platform: Works on Windows, macOS, and Linux
  6. Well-tested: Full test coverage with 24 comprehensive tests

🚀 Getting Started

  1. Setup configuration files:

    # Files are auto-created with examples on first run
    python youtube_control.py start
  2. Add your URLs and channels to yt.txt and channels.txt

  3. Start monitoring:

    python youtube_control.py start
  4. Monitor status:

    python youtube_control.py status

The YouTube scraper is now fully automated and ready for production use! 🎉

📝 Files Created/Modified

  • apps/youtube-scraper/youtube_live_monitor.py - Main monitoring service
  • apps/youtube-scraper/youtube_control.py - Service control script
  • apps/youtube-scraper/requirements.txt - Updated dependencies
  • tests/unit/youtube-scraper/test_youtube_live_monitor.py - Comprehensive tests
  • ✅ All tests passing (24/24) ✨