A lightweight Express.js server that provides an OpenAI-compatible API interface for Google's Gemini Live API with native audio, enabling seamless integration with existing OpenAI SDKs and tools. Supports both text transcription and audio output responses.
- Features
- Prerequisites
- Installation
- Usage
- API Reference
- Examples
- Deployment
- Architecture
- Contributing
- License
- OpenAI Compatibility: Drop-in replacement for OpenAI API clients
- Native Audio: Uses Gemini's native audio model for natural, human-like speech
- Audio Output: OpenAI-compatible audio responses with
message.audio(base64 WAV + transcript) - Text Fallback: When audio is not requested, returns transcription as standard
message.content - Voice Selection: 30 HD voices via the
audio.voiceparameter - Streaming Support: Real-time streaming and non-streaming responses
- Stateless Design: WebSocket connections open/close per request for high throughput
- High Throughput: Leverages Gemini Live API's higher rate limits (1M TPM vs OpenAI's 250k TPM)
- Security: IP-based access control and reverse proxy support
- Health Check: Built-in
/healthendpoint for monitoring - Docker Support: Ready-to-deploy containerized solution
- Node.js >= 20.0.0
- A valid Google Gemini API key from Google AI Studio (provided by clients in Authorization header)
-
Clone the repository:
git clone https://github.com/manuel-materazzo/gemini-live-to-openai-adapter.git cd gemini-live-to-openai-adapter -
Install dependencies:
npm install
The server supports additional environment variables:
PORT: Server port (default: 3000)ALLOWED_IPS: Comma-separated list of allowed IP addresses for access controlTRUSTED_PROXY_IPS: Comma-separated list of trusted proxy IPsREVERSE_PROXY_MODE: Enable reverse proxy mode (true/false)CORS_ORIGIN: Comma-separated list of allowed CORS origins (CORS disabled if not set)JSON_LIMIT: Maximum JSON body size (default: 256kb)REQUEST_TIMEOUT_MS: Request timeout in milliseconds (default: 60000)TOKEN_COUNT_MODE: Token usage estimation mode —estimate(character heuristic, default),count_tokens(Gemini countTokens API for prompt + heuristic for completion), oroff(returns 0)
Start the server with auto-reload for development:
npm run devStart the server:
npm startThe server will be available at http://localhost:3000 (or your configured PORT).
POST /v1/chat/completions: Main chat completion endpoint (OpenAI-compatible)GET /health: Health check endpoint
| Parameter | Type | Description |
|---|---|---|
model |
string | Gemini model name (default: gemini-2.5-flash-native-audio-preview-12-2025) |
messages |
array | Array of message objects with role and content |
stream |
boolean | Enable streaming responses |
temperature |
number | Controls randomness (0-2) |
max_tokens |
number | Maximum tokens in response |
modalities |
array | Output modalities: ["text"] (default) or ["text", "audio"] |
audio.voice |
string | Voice name for audio output (e.g., Kore, Puck, Charon) |
audio.format |
string | Audio format: wav (default) or pcm16 |
Request modalities |
Response format |
|---|---|
Not set / ["text"] |
Standard message.content with transcription text |
["text", "audio"] or ["audio"] |
message.audio.data (base64 WAV) + message.audio.transcript |
Text response (default):
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_gemini_api_key_here" \
-d '{
"model": "gemini-2.5-flash-native-audio-preview-12-2025",
"messages": [
{"role": "user", "content": "What is RAG in AI?"}
]
}'Audio + text response:
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_gemini_api_key_here" \
-d '{
"model": "gemini-2.5-flash-native-audio-preview-12-2025",
"modalities": ["text", "audio"],
"audio": {"voice": "Kore", "format": "wav"},
"messages": [
{"role": "user", "content": "What is RAG in AI?"}
]
}'from openai import OpenAI
client = OpenAI(
api_key="your_gemini_api_key_here",
base_url="http://localhost:3000/v1"
)
response = client.chat.completions.create(
model="gemini-2.5-flash-native-audio-preview-12-2025",
messages=[
{"role": "user", "content": "Explain retrieval augmented generation"}
]
)
print(response.choices[0].message.content)import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: 'your_gemini_api_key_here',
baseURL: 'http://localhost:3000/v1'
});
const response = await openai.chat.completions.create({
model: 'gemini-2.5-flash-native-audio-preview-12-2025',
messages: [
{ role: 'user', content: 'What is RAG?' }
]
});
console.log(response.choices[0].message.content);from openai import OpenAI
client = OpenAI(
api_key="your_gemini_api_key_here",
base_url="http://localhost:3000/v1"
)
stream = client.chat.completions.create(
model="gemini-2.5-flash-native-audio-preview-12-2025",
messages=[{"role": "user", "content": "Count to 10"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end='')Build and run with Docker:
# Build the image
docker build -t gemini-adapter .
# Run the container
docker run -p 3000:3000 --env-file .env gemini-adapterOr use the prebuilt image:
# Pull the image
docker pull ghcr.io/manuel-materazzo/gemini-live-to-openai-adapter:latest
# Run the container
docker run -p 3000:3000 --env-file .env ghcr.io/manuel-materazzo/gemini-live-to-openai-adapterUse the provided docker-compose.yml:
docker-compose up -dThe adapter follows a stateless, high-throughput design:
sequenceDiagram
participant Client
participant Express as Express Server
participant WS as WebSocket
participant Gemini as Gemini Live API
Client->>Express: POST /v1/chat/completions
Express->>WS: Open WebSocket connection
WS->>Gemini: Send request
Gemini-->>WS: Stream response
WS-->>Express: Receive chunks
Express-->>Client: Stream response chunks
WS->>WS: Close connection
- Client: Any OpenAI-compatible client or SDK
- Express Server: Handles HTTP requests and manages WebSocket connections
- WebSocket: Stateless connection to Gemini Live API (opens/closes per request)
- Gemini Live API: Google's real-time AI model
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Make your changes and add tests
- Run the server locally to ensure everything works
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.