Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 37 additions & 1 deletion docs/user-guide/backend/llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,10 +136,46 @@ openai_llm:
```yaml
gemini_llm:
llm_api_key: "Your Gemini API Key" # Gemini API 密钥
model: "gemini-2.0-flash-exp" # 使用的模型
model: "gemini-2.5-flash" # gemini-2.5-flash 已在 2026/10/16 停止服务,建议改用 gemini-3.5-flash
temperature: 1.0 # 温度,介于 0 到 2 之间
```

#### 使用 Gemini 云端模型降低本地运行需求

`gemini_llm` 使用 Google Gemini API 的 [OpenAI 兼容端点](https://ai.google.dev/gemini-api/docs/openai)。如果你希望减少本地 GPU/CPU 负担,可以把主要对话模型切到云端 Gemini,只在本地运行 Live2D、TTS、ASR 或其他你需要保留在本机的组件。

常见选择:

- `gemini-3.1-flash-lite`:适合高频、轻量、低延迟对话。
- `gemini-3.5-flash`:适合需要更强理解与推理能力的常规对话。
- 其他 Gemini Flash / Flash-Lite 模型:请以 [Gemini 模型列表](https://ai.google.dev/gemini-api/docs/models) 和 AI Studio 中可用的模型名称为准。

如果你想使用托管的 Gemma 4,请先确认服务提供的实际端点和模型名称。Gemma 4 是 Google 的开放模型系列,可以通过 Google AI Studio、Vertex AI 或其他托管服务使用;如果该服务提供 OpenAI 兼容接口,请优先把它配置到 `openai_compatible_llm`,而不是假设它一定属于 `gemini_llm`。Gemma 4 的本地运行路线仍然适合放在 `ollama_llm`、`lmstudio_llm` 或 `llama_cpp_llm`。

:::tip
Gemini 的 RPM、TPM、RPD 限制会随模型、项目、计费层级和账号状态变化,并且是按 Google Cloud/AI Studio 项目计算,不是按单个 API key 计算。请在 AI Studio 查看当前项目的实际限制,并参考 [Gemini API rate limits](https://ai.google.dev/gemini-api/docs/rate-limits)。如果遇到 429,请降低并发、缩短上下文、减少连续重试,或换用更高配额的项目。
:::
Comment on lines +143 to +157

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Does the Google Gemini API provide an OpenAI-compatible endpoint as of June 2026? What are the currently available Gemini Flash and Flash-Lite model variants?

💡 Result:

Yes. Google’s Gemini API provides an OpenAI-compatible REST endpoint for use with the OpenAI client libraries by pointing them at a Gemini “OpenAI compatibility” base URL. The docs show using the OpenAI library with base_url="https://generativelanguage.googleapis.com/v1beta/openai/" and calling the OpenAI-style path /chat/completions [1]. Currently available Gemini Flash model variants (Gemini API model IDs): - gemini-3.5-flash (stable/GA) [2] - gemini-3-flash-preview (listed as the preview version in the Gemini 3.5 Flash docs) [2] Currently available Gemini Flash-Lite model variants (Gemini API model IDs): - gemini-3.1-flash-lite (stable/GA) [3] Notes on “Flash-Lite” preview/discontinued variants: - The Gemini API release notes state that gemini-3.1-flash-lite-preview was shut down and that the GA replacement is gemini-3.1-flash-lite [4]. - The Gemini 2.0 Flash-Lite family is shown as shut down/discontinued by June 1, 2026 in the Gemini API changelog, and recommends using gemini-3.1-flash-lite instead [4].

Citations:


Update Gemini model names to match current Flash / Flash-Lite variants

Gemini API does provide an OpenAI-compatible endpoint (the docs endpoint link you cite is correct). However, the model IDs shown (gemini-2.5-flash-lite, gemini-2.5-flash) don’t match the currently available Flash/Flash-Lite variants in June 2026, which are gemini-3.1-flash-lite and gemini-3.5-flash (with some older Flash-Lite variants noted as shut down/discontinued). Update the examples accordingly; keep the Gemma 4 guidance as written (use openai_compatible_llm when the provider exposes an OpenAI-compatible interface).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/user-guide/backend/llm.md` around lines 143 - 157, Update the example
Gemini model IDs to the current Flash / Flash‑Lite variants: replace occurrences
of "gemini-2.5-flash-lite" with "gemini-3.1-flash-lite" and "gemini-2.5-flash"
with "gemini-3.5-flash" in the gemini_llm section, leaving the Gemma 4 guidance
and references to openai_compatible_llm, ollama_llm, lmstudio_llm, and
llama_cpp_llm unchanged.


#### 给 Claude、Codex 或其他 Agent 的配置提示词

你可以把下面这段提示词交给 Claude、Codex 或其他会编辑项目文件的 Agent,让它帮你安全地修改 `conf.yaml`。不要把真实 API key 写进公开 issue、PR、聊天记录或提交记录。

```text
You are configuring Open-LLM-VTuber to use a cloud Gemini model to reduce local hardware requirements.

Edit only the local conf.yaml file. Do not commit API keys, tokens, or private endpoint URLs.

Tasks:
1. Find character_config -> agent_config -> agent_settings -> basic_memory_agent.
2. Set llm_provider to gemini_llm.
3. Find character_config -> agent_config -> llm_configs -> gemini_llm.
4. Set llm_api_key to a placeholder such as YOUR_GEMINI_API_KEY unless the user explicitly provides a local secret handling method.
5. Set model to gemini-3.1-flash-lite for low-cost, high-frequency chat, or gemini-3.5-flash for stronger reasoning.
6. Keep temperature between 0.7 and 1.0 unless the user asks for a different personality style.
7. Remind the user to check Google AI Studio for the current model availability and RPM/TPM/RPD limits.
8. Run the app only after confirming the key is stored locally and not committed.
```

### 智谱 API (`zhipu_llm`)
前往[智谱](https://bigmodel.cn/) 获取 API key。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,10 +132,46 @@ Then adjust the settings here:
```yaml
gemini_llm:
llm_api_key: "Your Gemini API Key" # Gemini API key
model: "gemini-2.0-flash-exp" # Model to use
model: "gemini-2.5-flash" # gemini-2.5-flash is deprecated and scheduled for shutdown on Oct 16, 2026; switch to gemini-3.5-flash
temperature: 1.0 # Temperature, between 0 and 2
```

#### Use cloud Gemini models to reduce local hardware requirements

`gemini_llm` uses the Google Gemini API [OpenAI-compatible endpoint](https://ai.google.dev/gemini-api/docs/openai). If you want to reduce local GPU/CPU load, you can move the main conversation model to cloud Gemini while keeping Live2D, TTS, ASR, or other local components on your machine.

Common choices:

- `gemini-3.1-flash-lite`: Good for high-frequency, lightweight, low-latency chat.
- `gemini-3.5-flash`: Good for regular chat that needs stronger understanding and reasoning.
- Other Gemini Flash / Flash-Lite models: Use the model name shown in the [Gemini model list](https://ai.google.dev/gemini-api/docs/models) and AI Studio.

If you want to use hosted Gemma 4, first confirm the actual endpoint and model name exposed by your provider. Gemma 4 is Google's open model family and can be used through Google AI Studio, Vertex AI, or other hosted services; if the service exposes an OpenAI-compatible interface, configure it under `openai_compatible_llm` instead of assuming it belongs under `gemini_llm`. Local Gemma 4 setups still fit better under `ollama_llm`, `lmstudio_llm`, or `llama_cpp_llm`.

:::tip
Gemini RPM, TPM, and RPD limits vary by model, project, billing tier, and account status, and they are counted per Google Cloud / AI Studio project, not per individual API key. Check your current project limits in AI Studio and refer to the [Gemini API rate limits](https://ai.google.dev/gemini-api/docs/rate-limits). If you hit 429 errors, reduce concurrency, shorten context, avoid aggressive retries, or switch to a project with higher quota.
:::

#### Configuration prompt for Claude, Codex, or another agent

You can give the following prompt to Claude, Codex, or another agent that edits project files. Do not put real API keys in public issues, PRs, chat logs, or commits.

```text
You are configuring Open-LLM-VTuber to use a cloud Gemini model to reduce local hardware requirements.

Edit only the local conf.yaml file. Do not commit API keys, tokens, or private endpoint URLs.

Tasks:
1. Find character_config -> agent_config -> agent_settings -> basic_memory_agent.
2. Set llm_provider to gemini_llm.
3. Find character_config -> agent_config -> llm_configs -> gemini_llm.
4. Set llm_api_key to a placeholder such as YOUR_GEMINI_API_KEY unless the user explicitly provides a local secret handling method.
5. Set model to gemini-3.1-flash-lite for low-cost, high-frequency chat, or gemini-3.5-flash for stronger reasoning.
6. Keep temperature between 0.7 and 1.0 unless the user asks for a different personality style.
7. Remind the user to check Google AI Studio for the current model availability and RPM/TPM/RPD limits.
8. Run the app only after confirming the key is stored locally and not committed.
```

### Zhipu API (`zhipu_llm`)
Go to [Zhipu](https://bigmodel.cn/) to obtain an API key.

Expand Down