From b6a16343913a4814bad907e70591254c1c28d4e4 Mon Sep 17 00:00:00 2001 From: chiveswang <102021479+chiveswang@users.noreply.github.com> Date: Sun, 7 Jun 2026 11:49:58 +0800 Subject: [PATCH 1/2] docs: add cloud Gemini agent setup recipes --- docs/user-guide/backend/llm.md | 38 ++++++++++++++++++- .../current/user-guide/backend/llm.md | 38 ++++++++++++++++++- 2 files changed, 74 insertions(+), 2 deletions(-) diff --git a/docs/user-guide/backend/llm.md b/docs/user-guide/backend/llm.md index 56074a43..4dbf7933 100644 --- a/docs/user-guide/backend/llm.md +++ b/docs/user-guide/backend/llm.md @@ -136,10 +136,46 @@ openai_llm: ```yaml gemini_llm: llm_api_key: "Your Gemini API Key" # Gemini API 密钥 - model: "gemini-2.0-flash-exp" # 使用的模型 + model: "gemini-2.5-flash" # 使用的模型 temperature: 1.0 # 温度,介于 0 到 2 之间 ``` +#### 使用 Gemini 云端模型降低本地运行需求 + +`gemini_llm` 使用 Google Gemini API 的 [OpenAI 兼容端点](https://ai.google.dev/gemini-api/docs/openai)。如果你希望减少本地 GPU/CPU 负担,可以把主要对话模型切到云端 Gemini,只在本地运行 Live2D、TTS、ASR 或其他你需要保留在本机的组件。 + +常见选择: + +- `gemini-2.5-flash-lite`:适合高频、轻量、低延迟对话。 +- `gemini-2.5-flash`:适合需要更强理解与推理能力的常规对话。 +- 其他 Gemini Flash / Flash-Lite 模型:请以 [Gemini 模型列表](https://ai.google.dev/gemini-api/docs/models) 和 AI Studio 中可用的模型名称为准。 + +如果你想使用托管的 Gemma 4,请先确认服务提供的实际端点和模型名称。Gemma 4 是 Google 的开放模型系列,可以通过 Google AI Studio、Vertex AI 或其他托管服务使用;如果该服务提供 OpenAI 兼容接口,请优先把它配置到 `openai_compatible_llm`,而不是假设它一定属于 `gemini_llm`。Gemma 4 的本地运行路线仍然适合放在 `ollama_llm`、`lmstudio_llm` 或 `llama_cpp_llm`。 + +:::tip +Gemini 的 RPM、TPM、RPD 限制会随模型、项目、计费层级和账号状态变化,并且是按 Google Cloud/AI Studio 项目计算,不是按单个 API key 计算。请在 AI Studio 查看当前项目的实际限制,并参考 [Gemini API rate limits](https://ai.google.dev/gemini-api/docs/rate-limits)。如果遇到 429,请降低并发、缩短上下文、减少连续重试,或换用更高配额的项目。 +::: + +#### 给 Claude、Codex 或其他 Agent 的配置提示词 + +你可以把下面这段提示词交给 Claude、Codex 或其他会编辑项目文件的 Agent,让它帮你安全地修改 `conf.yaml`。不要把真实 API key 写进公开 issue、PR、聊天记录或提交记录。 + +```text +You are configuring Open-LLM-VTuber to use a cloud Gemini model to reduce local hardware requirements. + +Edit only the local conf.yaml file. Do not commit API keys, tokens, or private endpoint URLs. + +Tasks: +1. Find character_config -> agent_config -> agent_settings -> basic_memory_agent. +2. Set llm_provider to gemini_llm. +3. Find character_config -> agent_config -> llm_configs -> gemini_llm. +4. Set llm_api_key to a placeholder such as YOUR_GEMINI_API_KEY unless the user explicitly provides a local secret handling method. +5. Set model to gemini-2.5-flash-lite for low-cost, high-frequency chat, or gemini-2.5-flash for stronger reasoning. +6. Keep temperature between 0.7 and 1.0 unless the user asks for a different personality style. +7. Remind the user to check Google AI Studio for the current model availability and RPM/TPM/RPD limits. +8. Run the app only after confirming the key is stored locally and not committed. +``` + ### 智谱 API (`zhipu_llm`) 前往[智谱](https://bigmodel.cn/) 获取 API key。 diff --git a/i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/llm.md b/i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/llm.md index 6ef73a2a..c15c9d81 100644 --- a/i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/llm.md +++ b/i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/llm.md @@ -132,10 +132,46 @@ Then adjust the settings here: ```yaml gemini_llm: llm_api_key: "Your Gemini API Key" # Gemini API key - model: "gemini-2.0-flash-exp" # Model to use + model: "gemini-2.5-flash" # Model to use temperature: 1.0 # Temperature, between 0 and 2 ``` +#### Use cloud Gemini models to reduce local hardware requirements + +`gemini_llm` uses the Google Gemini API [OpenAI-compatible endpoint](https://ai.google.dev/gemini-api/docs/openai). If you want to reduce local GPU/CPU load, you can move the main conversation model to cloud Gemini while keeping Live2D, TTS, ASR, or other local components on your machine. + +Common choices: + +- `gemini-2.5-flash-lite`: Good for high-frequency, lightweight, low-latency chat. +- `gemini-2.5-flash`: Good for regular chat that needs stronger understanding and reasoning. +- Other Gemini Flash / Flash-Lite models: Use the model name shown in the [Gemini model list](https://ai.google.dev/gemini-api/docs/models) and AI Studio. + +If you want to use hosted Gemma 4, first confirm the actual endpoint and model name exposed by your provider. Gemma 4 is Google's open model family and can be used through Google AI Studio, Vertex AI, or other hosted services; if the service exposes an OpenAI-compatible interface, configure it under `openai_compatible_llm` instead of assuming it belongs under `gemini_llm`. Local Gemma 4 setups still fit better under `ollama_llm`, `lmstudio_llm`, or `llama_cpp_llm`. + +:::tip +Gemini RPM, TPM, and RPD limits vary by model, project, billing tier, and account status, and they are counted per Google Cloud / AI Studio project, not per individual API key. Check your current project limits in AI Studio and refer to the [Gemini API rate limits](https://ai.google.dev/gemini-api/docs/rate-limits). If you hit 429 errors, reduce concurrency, shorten context, avoid aggressive retries, or switch to a project with higher quota. +::: + +#### Configuration prompt for Claude, Codex, or another agent + +You can give the following prompt to Claude, Codex, or another agent that edits project files. Do not put real API keys in public issues, PRs, chat logs, or commits. + +```text +You are configuring Open-LLM-VTuber to use a cloud Gemini model to reduce local hardware requirements. + +Edit only the local conf.yaml file. Do not commit API keys, tokens, or private endpoint URLs. + +Tasks: +1. Find character_config -> agent_config -> agent_settings -> basic_memory_agent. +2. Set llm_provider to gemini_llm. +3. Find character_config -> agent_config -> llm_configs -> gemini_llm. +4. Set llm_api_key to a placeholder such as YOUR_GEMINI_API_KEY unless the user explicitly provides a local secret handling method. +5. Set model to gemini-2.5-flash-lite for low-cost, high-frequency chat, or gemini-2.5-flash for stronger reasoning. +6. Keep temperature between 0.7 and 1.0 unless the user asks for a different personality style. +7. Remind the user to check Google AI Studio for the current model availability and RPM/TPM/RPD limits. +8. Run the app only after confirming the key is stored locally and not committed. +``` + ### Zhipu API (`zhipu_llm`) Go to [Zhipu](https://bigmodel.cn/) to obtain an API key. From e5184bd5120d116e798fda9576edc49239ba14bd Mon Sep 17 00:00:00 2001 From: chiveswang <102021479+chiveswang@users.noreply.github.com> Date: Mon, 8 Jun 2026 23:16:05 +0800 Subject: [PATCH 2/2] docs: refresh gemini model recommendations --- docs/user-guide/backend/llm.md | 8 ++++---- .../current/user-guide/backend/llm.md | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/user-guide/backend/llm.md b/docs/user-guide/backend/llm.md index 4dbf7933..e59cdebb 100644 --- a/docs/user-guide/backend/llm.md +++ b/docs/user-guide/backend/llm.md @@ -136,7 +136,7 @@ openai_llm: ```yaml gemini_llm: llm_api_key: "Your Gemini API Key" # Gemini API 密钥 - model: "gemini-2.5-flash" # 使用的模型 + model: "gemini-2.5-flash" # gemini-2.5-flash 已在 2026/10/16 停止服务,建议改用 gemini-3.5-flash temperature: 1.0 # 温度,介于 0 到 2 之间 ``` @@ -146,8 +146,8 @@ gemini_llm: 常见选择: -- `gemini-2.5-flash-lite`:适合高频、轻量、低延迟对话。 -- `gemini-2.5-flash`:适合需要更强理解与推理能力的常规对话。 +- `gemini-3.1-flash-lite`:适合高频、轻量、低延迟对话。 +- `gemini-3.5-flash`:适合需要更强理解与推理能力的常规对话。 - 其他 Gemini Flash / Flash-Lite 模型:请以 [Gemini 模型列表](https://ai.google.dev/gemini-api/docs/models) 和 AI Studio 中可用的模型名称为准。 如果你想使用托管的 Gemma 4,请先确认服务提供的实际端点和模型名称。Gemma 4 是 Google 的开放模型系列,可以通过 Google AI Studio、Vertex AI 或其他托管服务使用;如果该服务提供 OpenAI 兼容接口,请优先把它配置到 `openai_compatible_llm`,而不是假设它一定属于 `gemini_llm`。Gemma 4 的本地运行路线仍然适合放在 `ollama_llm`、`lmstudio_llm` 或 `llama_cpp_llm`。 @@ -170,7 +170,7 @@ Tasks: 2. Set llm_provider to gemini_llm. 3. Find character_config -> agent_config -> llm_configs -> gemini_llm. 4. Set llm_api_key to a placeholder such as YOUR_GEMINI_API_KEY unless the user explicitly provides a local secret handling method. -5. Set model to gemini-2.5-flash-lite for low-cost, high-frequency chat, or gemini-2.5-flash for stronger reasoning. +5. Set model to gemini-3.1-flash-lite for low-cost, high-frequency chat, or gemini-3.5-flash for stronger reasoning. 6. Keep temperature between 0.7 and 1.0 unless the user asks for a different personality style. 7. Remind the user to check Google AI Studio for the current model availability and RPM/TPM/RPD limits. 8. Run the app only after confirming the key is stored locally and not committed. diff --git a/i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/llm.md b/i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/llm.md index c15c9d81..0e4df8d3 100644 --- a/i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/llm.md +++ b/i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/llm.md @@ -132,7 +132,7 @@ Then adjust the settings here: ```yaml gemini_llm: llm_api_key: "Your Gemini API Key" # Gemini API key - model: "gemini-2.5-flash" # Model to use + model: "gemini-2.5-flash" # gemini-2.5-flash is deprecated and scheduled for shutdown on Oct 16, 2026; switch to gemini-3.5-flash temperature: 1.0 # Temperature, between 0 and 2 ``` @@ -142,8 +142,8 @@ gemini_llm: Common choices: -- `gemini-2.5-flash-lite`: Good for high-frequency, lightweight, low-latency chat. -- `gemini-2.5-flash`: Good for regular chat that needs stronger understanding and reasoning. +- `gemini-3.1-flash-lite`: Good for high-frequency, lightweight, low-latency chat. +- `gemini-3.5-flash`: Good for regular chat that needs stronger understanding and reasoning. - Other Gemini Flash / Flash-Lite models: Use the model name shown in the [Gemini model list](https://ai.google.dev/gemini-api/docs/models) and AI Studio. If you want to use hosted Gemma 4, first confirm the actual endpoint and model name exposed by your provider. Gemma 4 is Google's open model family and can be used through Google AI Studio, Vertex AI, or other hosted services; if the service exposes an OpenAI-compatible interface, configure it under `openai_compatible_llm` instead of assuming it belongs under `gemini_llm`. Local Gemma 4 setups still fit better under `ollama_llm`, `lmstudio_llm`, or `llama_cpp_llm`. @@ -166,7 +166,7 @@ Tasks: 2. Set llm_provider to gemini_llm. 3. Find character_config -> agent_config -> llm_configs -> gemini_llm. 4. Set llm_api_key to a placeholder such as YOUR_GEMINI_API_KEY unless the user explicitly provides a local secret handling method. -5. Set model to gemini-2.5-flash-lite for low-cost, high-frequency chat, or gemini-2.5-flash for stronger reasoning. +5. Set model to gemini-3.1-flash-lite for low-cost, high-frequency chat, or gemini-3.5-flash for stronger reasoning. 6. Keep temperature between 0.7 and 1.0 unless the user asks for a different personality style. 7. Remind the user to check Google AI Studio for the current model availability and RPM/TPM/RPD limits. 8. Run the app only after confirming the key is stored locally and not committed.