| ENGLISH | 中文 | Official Website | Discord Community |
An AI-powered agent designed to watch live streams, understand the content (audio, chat, video), and participate in the chat automatically using a Large Language Model (LLM).
This project enables an AI to act as a viewer in live streams. It captures stream data, sends it to a backend for processing by an LLM (like GPT models), and uses the AI's response to interact with the chat.
It consists of two main parts:
- Frontend Userscript: Runs in your browser (via Tampermonkey/Violentmonkey) on live stream pages. It captures audio, chat messages, and screenshots, displays a control panel, and communicates with the backend.
- Backend Server: A modular Python Flask server with independent service layers: State Management Service, External APIs Service, and LLM Service. It receives data, performs Speech-to-Text (STT), optionally uploads screenshots, interacts with an LLM, manages conversation memory, and sends back generated chat messages.
- Automated Chat Interaction: Sends AI-generated messages based on stream content.
- Multimodal Context: Processes live audio, chat history, and screenshots (optional).
- LLM Integration: Leverages powerful LLMs (OpenAI-compatible APIs).
- Customizable AI Persona: Define the agent's behavior via a system prompt.
- Persistent Memory: Maintains per-stream conversation history and a notepad.
- Multiple STT Options: Supports Whisper (via LLM API) and Youdao ASR.
- Vision Support (Optional): Uploads screenshots to Cloudinary for analysis by vision LLMs.
- User Control Panel: In-page UI to Start/Stop, manage chat permissions, and adjust local volume/mute.
- Currently Supported:
- YouTube Live (
youtube.com) - Twitch (
twitch.tv) - Bilibili Live (
live.bilibili.com)
- YouTube Live (
- Planned Future Support:
- Frontend: JavaScript (ES6+), Web Audio API, MediaRecorder API, Canvas API, DOM Manipulation
- Backend: Python 3, Flask, Flask-CORS, Requests, OpenAI Python Library, Pillow, Cloudinary Python SDK (optional), python-dotenv, Tiktoken
- AI Services: OpenAI-compatible LLM API, Youdao ASR API (optional), Whisper (optional)
- Userscript Manager: Tampermonkey or Violentmonkey
- Modern Web Browser (Chrome, Firefox, Edge)
- Tampermonkey or Violentmonkey browser extension
- Python 3.8+
pippackage installerffmpeginstalled and in system PATH (or path specified in.env)- API Keys:
- LLM API Key & URL (Required)
- Youdao App Key & Secret (If using Youdao STT)
- Cloudinary Credentials (If using Cloudinary vision uploads)
- Backend Setup: Clone repo, install Python dependencies (
pip install -r requirements.txt), configure API keys in.env(copy from.env.example), runpython src/app.py. (See Backend Setup) - Frontend Setup: Install Tampermonkey/Violentmonkey, install the
.user.jsscript, ensureINFERENCE_SERVICE_URLandINFERENCE_SERVICE_API_KEYin the script match your backend configuration. (See Frontend Setup) - Usage: Go to a supported live stream, use the control panel to start the agent. (See Usage Guide)
➡️ For detailed steps, please read the Full Tutorial (TUTORIAL.md)
.
├── backend/ # Server code and related files
│ ├── src/ # Source code directory
│ │ ├── app.py # Flask application main entry
│ │ ├── services/ # Service layer
│ │ │ ├── external_apis.py # External API integration service
│ │ │ ├── llm_service.py # LLM processing service
│ │ │ └── state_service.py # State management service
│ │ └── utils/ # Utility modules
│ │ └── config.py # Configuration management
│ ├── memory/ # Persistent memory storage
│ ├── prompts/ # System prompt files
│ ├── requirements.txt # Python dependencies
│ └── .env.example # Environment configuration example
├── frontend/ # Userscript code
│ └── live-stream-chat-ai-agent.user.js
├── docs/ # Documentation and images
├── tools/ # Helper tools
├── README.md # This file (English)
├── README.zh-CN.md # Chinese Readme
├── LICENSE # AGPL-3.0 License file
└── .gitignore # Git ignore rules
Contributions are welcome! Please refer to the Contributing Guide (to be created) for more details.
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See the LICENSE file for the full text.
- Use Responsibly: Automating chat requires ethical considerations. Respect platform ToS and streamer rules. Avoid spamming.
- API Costs: LLM, STT, and Cloudinary usage may incur costs. Monitor your usage.
- Terms of Service: Using automated scripts may violate platform ToS. Use at your own risk.
- AI Limitations: AI understanding depends on input quality (STT errors) and LLM capabilities. Misinterpretations or inappropriate responses are possible.
