Skip to content

Latest commit

 

History

History
844 lines (599 loc) · 18.4 KB

File metadata and controls

844 lines (599 loc) · 18.4 KB

VLog Troubleshooting Guide

This guide covers common issues and their solutions.


Quick Diagnostics

Health Checks

# Check all services
curl -s http://localhost:9000/health  # Public API
curl -s http://localhost:9001/health  # Admin API
curl -s http://localhost:9002/api/health  # Worker API

# Check systemd services
sudo systemctl status vlog-public vlog-admin vlog-worker vlog-worker-api

# Check workers
vlog worker status

Log Locations

Service Log Command
Public API journalctl -u vlog-public -f
Admin API journalctl -u vlog-admin -f
Worker journalctl -u vlog-worker -f
Worker API journalctl -u vlog-worker-api -f
Transcription journalctl -u vlog-transcription -f
Kubernetes workers kubectl logs -n vlog -l app=vlog-worker -f
Audit logs tail -f /var/log/vlog/audit.log

Video Processing Issues

Video Stuck in "Pending"

Symptom: Video uploaded but status remains "pending"

Causes and Solutions:

  1. Worker not running

    sudo systemctl status vlog-worker
    sudo systemctl start vlog-worker
  2. Redis job queue issue (if using Redis)

    # Check Redis connection
    redis-cli ping
    
    # Check queue mode
    grep JOB_QUEUE_MODE /etc/systemd/system/vlog-worker.service
  3. Upload file missing

    ls /mnt/nas/vlog-storage/uploads/
    # Should contain the video file

Video Stuck in "Processing"

Symptom: Video shows "processing" but no progress

Causes and Solutions:

  1. FFmpeg crashed or hung

    # Check worker logs for errors
    journalctl -u vlog-worker --since "1 hour ago" | grep -i error
    
    # Check for FFmpeg processes
    pgrep -f ffmpeg
  2. Disk space full

    df -h /mnt/nas/vlog-storage
    df -h /tmp  # Work directory
  3. Stale job (worker crashed)

    # Check job status in database
    psql -U vlog -d vlog -c "SELECT id, video_id, current_step, worker_id, last_checkpoint FROM transcoding_jobs WHERE completed_at IS NULL"
    
    # Reset stale jobs (use with caution)
    psql -U vlog -d vlog -c "UPDATE transcoding_jobs SET worker_id = NULL WHERE completed_at IS NULL AND last_checkpoint < NOW() - INTERVAL '30 minutes'"
  4. Kubernetes worker issues

    kubectl get pods -n vlog
    kubectl describe pod -n vlog <pod-name>
    kubectl logs -n vlog <pod-name>

Transcoding Failed

Symptom: Video status is "failed"

Causes and Solutions:

  1. Check error message

    psql -U vlog -d vlog -c "SELECT id, title, error_message FROM videos WHERE status = 'failed'"
  2. Unsupported codec

    • Check if source video uses a codec FFmpeg can decode
    • Try re-encoding the source with HandBrake
  3. Corrupt source file

    • Re-upload the video
    • Test with ffprobe /path/to/video.mp4
  4. Retry failed job

    • Via Admin UI: Click "Retry" on the video
    • Via API: POST /api/videos/{id}/retry

GPU Encoding Issues

Symptom: GPU encoding fails, falls back to CPU

Causes and Solutions:

  1. NVIDIA issues

    # Check GPU access
    nvidia-smi
    
    # Check container GPU access (K8s)
    kubectl exec -n vlog <pod> -- nvidia-smi
    
    # Verify runtime class
    kubectl get pods -n vlog -o yaml | grep runtimeClass
  2. Intel VAAPI issues

    # Check for render device
    ls -la /dev/dri/
    
    # Test VAAPI
    vainfo
    
    # Container access (K8s)
    kubectl exec -n vlog <pod> -- vainfo
  3. Session limit reached (NVIDIA consumer GPUs)

    • RTX 3090: 3 concurrent sessions
    • RTX 4090: 5 concurrent sessions
    • Reduce VLOG_PARALLEL_QUALITIES setting

Playback Issues

Video Won't Play

Symptom: Video appears stuck or shows error in player

Causes and Solutions:

  1. Missing manifest

    # Check files exist
    ls /mnt/nas/vlog-storage/videos/<slug>/
    # Should contain master.m3u8 (or manifest.mpd for CMAF)
  2. MIME type issues

    • Check browser dev tools Network tab
    • .m3u8 should be application/vnd.apple.mpegurl
    • .ts should be video/mp2t
    • .m4s should be video/iso.segment
  3. CORS issues

    # Check CORS headers
    curl -I http://localhost:9000/videos/<slug>/master.m3u8
  4. CDN caching stale manifest

    • Purge CDN cache for manifests
    • Check CDN TTL settings

Quality Not Available

Symptom: Expected quality missing from player

Causes and Solutions:

  1. Source resolution too low

    • VLog only generates qualities at or below source
    • Check source resolution: ffprobe <source>
  2. Transcoding incomplete

    # Check quality progress
    psql -U vlog -d vlog -c "SELECT * FROM quality_progress WHERE job_id = <job_id>"

Shaka Player / DASH Issues

Symptom: DASH playback fails, HLS works

Causes and Solutions:

  1. Missing manifest.mpd

    ls /mnt/nas/vlog-storage/videos/<slug>/manifest.mpd
  2. Codec string issues

    • Check browser console for codec errors
    • Verify HEVC/AV1 browser support
  3. Regenerate manifests

    vlog manifests regenerate --slug <video-slug>

Database Issues

Connection Refused

Symptom: "Connection refused" errors

Causes and Solutions:

  1. PostgreSQL not running

    sudo systemctl status postgresql
    sudo systemctl start postgresql
  2. Wrong connection URL

    # Check environment
    grep DATABASE_URL /etc/systemd/system/vlog-*.service
    
    # Test connection
    psql -U vlog -d vlog -c "SELECT 1"
  3. pg_hba.conf authentication

    sudo vim /var/lib/pgsql/data/pg_hba.conf
    # Ensure local connections use md5:
    # local  all  all  md5
    sudo systemctl restart postgresql

Database Locked (SQLite only)

Symptom: "Database is locked" errors

Solution: Migrate to PostgreSQL. SQLite doesn't support concurrent writes.

Query Timeout

Symptom: Slow queries or timeouts

Causes and Solutions:

  1. Missing indexes

    -- Check slow queries
    SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;
  2. Connection pool exhausted

    • Check vlog_db_connections_active metric
    • Increase pool size if needed

Redis Issues

Connection Failed

Symptom: Redis connection errors, circuit breaker open

Causes and Solutions:

  1. Redis not running

    sudo systemctl status redis
    # or
    docker ps | grep redis
  2. Wrong password

    # Test connection
    redis-cli -a <password> ping
  3. Circuit breaker open

    • Check vlog_redis_circuit_breaker_state metric
    • VLog falls back to database polling automatically

SSE Updates Not Working

Symptom: Admin UI doesn't show real-time progress

Causes and Solutions:

  1. Redis required for SSE

    # Enable Redis
    export VLOG_REDIS_URL=redis://:password@localhost:6379
    sudo systemctl restart vlog-admin
  2. Check Redis Pub/Sub

    redis-cli -a <password> PUBSUB CHANNELS "vlog:*"

Storage Issues

NAS Mount Problems

Symptom: "No such file or directory" or permission errors

Causes and Solutions:

  1. Mount dropped

    mount | grep vlog-storage
    # If not mounted:
    sudo mount -a
  2. Stale NFS mount

    # Force remount
    sudo umount -l /mnt/nas/vlog-storage
    sudo mount /mnt/nas/vlog-storage
  3. Permission issues

    ls -la /mnt/nas/vlog-storage
    # Should be owned by vlog user

Disk Space Full

Symptom: Upload or transcoding fails

Causes and Solutions:

  1. Check storage

    df -h /mnt/nas/vlog-storage
    du -sh /mnt/nas/vlog-storage/*
  2. Clean up archive

    # Check archived videos
    ls /mnt/nas/vlog-storage/archive/
    
    # Permanently delete old archives
    find /mnt/nas/vlog-storage/archive -type d -mtime +30 -exec rm -rf {} \;
  3. Clean up uploads

    # Remove orphaned uploads
    ls /mnt/nas/vlog-storage/uploads/

Worker Issues

Workers Not Connecting

Symptom: vlog worker status shows no workers

Causes and Solutions:

  1. Worker API not running

    sudo systemctl status vlog-worker-api
    curl http://localhost:9002/api/health
  2. Wrong API URL

    # Check worker config
    kubectl get configmap vlog-worker-config -n vlog -o yaml
  3. API key invalid

    # Check worker logs
    kubectl logs -n vlog -l app=vlog-worker | grep -i auth
    
    # Re-register worker if needed
    vlog worker register --name "new-worker"
  4. Firewall blocking

    sudo firewall-cmd --list-ports
    # Should include 9002/tcp

Workers Offline

Symptom: Workers showing "offline" status

Causes and Solutions:

  1. Heartbeat failing

    # Check worker logs
    kubectl logs -n vlog <pod> | grep heartbeat
  2. Pod restarting

    kubectl get pods -n vlog
    kubectl describe pod -n vlog <pod>
  3. Network policy blocking

    kubectl get networkpolicy -n vlog

Admin UI Issues

Setup Wizard Not Appearing

Symptom: Expected setup wizard but see login page instead

Causes and Solutions:

  1. Users already exist

    # Check if users exist
    psql -U vlog -d vlog -c "SELECT COUNT(*) FROM users"

    If users exist, the setup wizard is skipped. Log in with an existing account.

  2. Database migration issue

    # Check users table exists
    psql -U vlog -d vlog -c "\\dt users"
    # Run migrations if needed
    python api/database.py

Login Not Working

Symptom: Can't log into Admin UI

Causes and Solutions:

  1. Incorrect credentials

    • Verify username/email is correct
    • Password is case-sensitive
  2. Account locked (too many failed attempts)

    # Check account status
    psql -U vlog -d vlog -c "SELECT username, status, failed_login_attempts, locked_until FROM users WHERE username = 'your_user'"
    
    # Unlock account (if needed)
    psql -U vlog -d vlog -c "UPDATE users SET failed_login_attempts = 0, locked_until = NULL WHERE username = 'your_user'"
  3. Session secret not set

    # Verify VLOG_SESSION_SECRET_KEY is set
    grep SESSION_SECRET_KEY /etc/systemd/system/vlog-admin.service
  4. Cookie issues

    • Clear browser cookies
    • Check VLOG_SECURE_COOKIES matches your HTTPS setup:
      • Set VLOG_SECURE_COOKIES=false for HTTP (development only)
      • Set VLOG_SECURE_COOKIES=true for HTTPS (production)
  5. OIDC login failing

    • Verify OIDC discovery URL is accessible
    • Check client ID and secret are correct
    • Ensure callback URL matches: https://your-domain/api/v1/auth/oidc/callback

Account Disabled

Symptom: Login fails with "Account disabled" error

Solution:

# Re-enable the account
psql -U vlog -d vlog -c "UPDATE users SET status = 'active' WHERE username = 'your_user'"

Password Reset Not Working

Symptom: Password reset email not received

Causes and Solutions:

  1. Email not configured

    • VLog doesn't send emails directly
    • Configure external email or share reset link manually
  2. Reset token expired

    • Tokens expire after 24 hours (configurable via VLOG_PASSWORD_RESET_EXPIRY_HOURS)
    • Request a new reset
  3. Admin force reset

    # Generate new password hash
    python -c "from api.auth.password import hash_password; print(hash_password('new-password-here'))"
    
    # Update in database
    psql -U vlog -d vlog -c "UPDATE users SET password_hash = 'HASH_FROM_ABOVE' WHERE username = 'your_user'"

API Key Authentication Not Working

Symptom: API calls with Authorization: Bearer header fail

Causes and Solutions:

  1. Key revoked or expired

    # Check key status
    psql -U vlog -d vlog -c "SELECT key_prefix, expires_at, revoked_at FROM user_api_keys WHERE key_prefix LIKE 'vlog_ak_%'"
  2. Wrong header format

    • Must be: Authorization: Bearer vlog_ak_xxxxx
    • Not: X-API-Key: vlog_ak_xxxxx
  3. Key has insufficient permissions

    • API keys inherit the creating user's role
    • Check user's role has required permission

Session Expired

Symptom: Logged out unexpectedly

Causes and Solutions:

  1. Session timeout

    • Default session expires after 24 hours
    • Configure VLOG_SESSION_EXPIRY_HOURS for longer sessions
  2. Maximum sessions reached

    • Users limited to 10 concurrent sessions by default
    • Revoke old sessions via profile menu
  3. Admin revoked your session

    • Contact administrator

Settings Not Saving

Symptom: Settings changes don't persist

Causes and Solutions:

  1. Database connection

    # Check settings table
    psql -U vlog -d vlog -c "SELECT * FROM settings LIMIT 5"
  2. Cache delay

    • Settings cache for 60 seconds
    • Wait and refresh

Performance Issues

Slow API Responses

Causes and Solutions:

  1. Check metrics

    curl http://localhost:9001/metrics | grep http_request_duration
  2. Database slow

    • Check vlog_db_query_duration_seconds metric
    • Add indexes if needed
  3. High load

    • Check vlog_transcoding_jobs_active metric
    • Scale workers or reduce parallel qualities

High Memory Usage

Causes and Solutions:

  1. Whisper model too large

    # Use smaller model
    export VLOG_WHISPER_MODEL=small  # Instead of large-v3
  2. Too many parallel qualities

    export VLOG_PARALLEL_QUALITIES=1

Webhook Issues

Webhooks Not Delivering

Symptom: Webhook deliveries stuck in pending or failing

Causes and Solutions:

  1. Target URL unreachable

    # Test connectivity from server
    curl -I https://your-webhook-endpoint.com
  2. SSRF protection blocking

    • Webhooks cannot target private IPs (10.x, 172.16-31.x, 192.168.x)
    • Use public endpoints only
  3. Circuit breaker open

    # Check delivery status
    psql -U vlog -d vlog -c "SELECT id, status, attempts, last_error FROM webhook_deliveries WHERE webhook_id = <id> ORDER BY created_at DESC LIMIT 5"
  4. Webhook disabled

    • Check webhook status in Admin UI
    • Circuit breaker may have disabled after consecutive failures

Signature Verification Failing

Symptom: Receiving webhook but signature doesn't match

Causes and Solutions:

  1. Wrong secret

    • Copy secret exactly from Admin UI
    • Check for trailing whitespace
  2. Encoding issues

    # Verify signature correctly
    import hmac, hashlib
    expected = hmac.new(secret.encode(), payload, hashlib.sha256).hexdigest()
    signature = request.headers.get('X-VLog-Signature')
    # Compare: f"sha256={expected}" == signature
  3. Body modified

    • Use raw request body, not parsed JSON
    • Middleware may be modifying the payload

Rate Limiting Issues

Getting Rate Limited

Symptom: HTTP 429 Too Many Requests errors

Causes and Solutions:

  1. Check current limits

    vlog settings get rate_limiting.public_default
  2. Increase limits (via Admin UI or CLI)

    vlog settings set rate_limiting.public_default "200/minute"
  3. Using Redis rate limiting

    • Memory backend doesn't share state across instances
    • Enable Redis for consistent rate limiting:
    export VLOG_RATE_LIMIT_STORAGE_URL=redis://localhost:6379

Rate Limiter Not Working

Symptom: Rate limits not being enforced

Causes and Solutions:

  1. Check if enabled

    grep RATE_LIMIT /etc/systemd/system/vlog-public.service
  2. Behind proxy without forwarded headers

    # nginx needs to forward client IP
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

Content Security Policy (CSP) Issues

Alpine.js Errors

Symptom: JavaScript errors, UI not interactive

Causes and Solutions:

  1. Check browser console for CSP violations

    • Look for "refused to evaluate script" errors
  2. Inline event handlers blocked

    • VLog uses CSP-compliant Alpine.js patterns
    • Don't add inline onclick/onchange handlers
  3. Third-party scripts blocked

    • Custom scripts need to be allowlisted
    • Check nginx CSP headers

Video Player Not Loading

Symptom: Player shows blank or errors

Causes and Solutions:

  1. Check for blob: CSP issues

    • Shaka Player requires blob: in media-src
    • hls.js requires blob: in worker-src
  2. Verify CSP headers

    curl -I http://localhost:9000 | grep -i content-security

Video Download Issues

Downloads Not Working

Symptom: Download links return 404 or 403

Causes and Solutions:

  1. Feature disabled

    # Enable downloads
    export VLOG_DOWNLOADS_ENABLED=true
    sudo systemctl restart vlog-public
  2. Original downloads disabled

    # Check setting
    grep DOWNLOADS_ALLOW_ORIGINAL /etc/systemd/system/vlog-public.service
  3. Rate limited

    • Default: 10 downloads/hour per IP
    • Check VLOG_DOWNLOADS_RATE_LIMIT_PER_HOUR

Download Files Missing

Symptom: Download returns error or incomplete file

Causes and Solutions:

  1. Video not fully transcoded

    • Only "ready" videos can be downloaded
    • Check video status in Admin UI
  2. Original file deleted

    • Originals are deleted after transcoding by default
    • Set VLOG_KEEP_ORIGINAL=true to preserve

Debug Mode

Enable debug logging for more information:

# Set log level
export VLOG_LOG_LEVEL=DEBUG

# Or in systemd service
Environment=VLOG_LOG_LEVEL=DEBUG

Getting Help

If you can't resolve an issue:

  1. Check existing issues: https://github.com/filthyrake/vlog/issues
  2. Collect logs:
    journalctl -u vlog-* --since "1 hour ago" > vlog-logs.txt
  3. Open an issue with:
    • VLog version
    • Error messages
    • Relevant logs
    • Steps to reproduce

Related Documentation