A production-ready, multi-provider AI chatbot with robust fallback chain and message normalization.
- Quick Start (5-Minute Setup)
- Overview
- Architecture
- Setup & Configuration
- File Structure
- Complete Code Implementation
- Implementation Checklist
- Code Snippets Reference
- Usage Examples
- Troubleshooting
- Customization
npm install @google/generative-ai @ai-sdk/openai @ai-sdk/groq ai @upstash/redisCreate .env.local:
# Required: At least one AI provider
GOOGLE_GEMINI_API_KEY=your_key_here
# Optional: Fallback providers (recommended)
OPENROUTER_API_KEY=your_key_here
GROQ_API_KEY=your_key_here
HUGGING_FACE_API_KEY=your_key_here
# Required: Redis for sessions
UPSTASH_REDIS_URL=your_redis_url
UPSTASH_REDIS_TOKEN=your_redis_tokenCopy these files to your project:
lib/ai.ts- AI provider logiclib/redis.ts- Session managementapp/api/chat/route.ts- API endpointhooks/use-chat.ts- React hook
import { useChat } from "@/hooks/use-chat";
function Chatbot() {
const { messages, isLoading, sendMessage } = useChat();
// ... your UI code
}This guide provides a complete, production-ready implementation of a multi-provider AI chatbot with:
- 5 AI Provider Fallback Chain: Gemini → OpenRouter → Groq → Hugging Face → OpenAI
- Robust Message Normalization: 5 layers ensuring string format (handles array/object formats)
- RAG (Retrieval Augmented Generation): FAQ-based context retrieval
- Session Management: Redis-based chat history with normalization
- Streaming Support: Real-time response streaming
- Fast Rate Limit Handling: Skips remaining models immediately on 429 errors
- Deep Validation: Prevents mutation and ensures type safety
- Type-Safe: Full TypeScript implementation
User Message
↓
Gemini (Primary) → Rate Limited? → Skip remaining Gemini models immediately
↓
OpenRouter (Fallback 1) → Failed? → Try next
↓
Groq (Fallback 2) → Failed? → Try next
↓
Hugging Face (Fallback 3) → Try 20+ models → Failed? → Try next
↓
OpenAI Direct (Fallback 4) → Final backup
↓
Response to User┌─────────────────────────────────────────────────────────────────┐
│ User Interface │
│ (React Component/Hook) │
└────────────────────────────┬────────────────────────────────────┘
│ POST /api/chat
▼
┌─────────────────────────────────────────────────────────────────┐
│ API Route Handler │
│ (app/api/chat/route.ts) │
│ 1. Get/Create Session (from cookies) │
│ 2. Add User Message │
│ 3. Search FAQs (RAG) → Context │
│ 4. Call getAIResponse() │
│ 5. Stream Response │
│ 6. Save Assistant Message │
└────────────────────────────┬────────────────────────────────────┘
│ getAIResponse(messages, context)
▼
┌─────────────────────────────────────────────────────────────────┐
│ AI Provider Logic │
│ (lib/ai.ts) │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Message Normalization Layers │ │
│ │ 1. normalizeContentToString() helper │ │
│ │ 2. Initial normalization (normalizedMessages) │ │
│ │ 3. Double-check (fullMessages loop) │ │
│ │ 4. Final verification (invalidMessages check) │ │
│ │ 5. prepareAIMessages() helper │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Fallback Chain │ │
│ │ Primary: Gemini → OpenRouter → Groq → HF → OpenAI │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
│ Response
▼
┌─────────────────────────────────────────────────────────────────┐
│ Session Management │
│ (lib/redis.ts) │
│ getSession() → Normalize Messages (CRITICAL) │
│ saveSession() → Store in Redis │
└─────────────────────────────────────────────────────────────────┘5 Critical Layers:
-
Redis Retrieval (
lib/redis.ts): Normalize when loading from Redis- Array format → String
- Object format → Extract text
- Already string → Keep
-
Initial Processing (
lib/ai.ts): Normalize all incoming messages- Uses
normalizeContentToString()helper - Handles string, array, object formats
- Uses
-
Double-Check (
lib/ai.ts): Verify all messages are strings- Loops through
fullMessages - Normalizes any non-string content
- Loops through
-
Final Verification (
lib/ai.ts): Force normalize if needed- Checks for invalid messages
- Force normalizes all invalid messages
-
Runtime Validation (
lib/ai.ts): Deep validation for OpenRouter- Deep clone messages
- Ensure string content
- Final check before sending
Gemini API Call
↓
Rate Limit (429)? → Fast Skip → Try OpenRouter immediately
↓
Other Error? → Try Next Gemini Model
↓
All Gemini Failed? → Try OpenRouter
↓
OpenRouter Failed? → Try Groq
↓
Groq Failed? → Try Hugging Face (20+ models)
↓
All Hugging Face Failed? → Try OpenAI
↓
All Failed? → Throw ErrorCreate .env.local file:
# Primary AI Provider
GOOGLE_GEMINI_API_KEY=your_gemini_api_key_here
# Fallback 1: OpenRouter (supports multiple models)
OPENROUTER_API_KEY=your_openrouter_api_key_here
# Alternative env var name (optional)
# OpenRouter_API_KEY=your_openrouter_api_key_here
# Fallback 2: Groq (fast, free tier available)
GROQ_API_KEY=your_groq_api_key_here
# Alternative env var name (optional)
# Groq_Llama_API_KEY=your_groq_api_key_here
# Fallback 3: Hugging Face (free models)
HUGGING_FACE_API_KEY=your_huggingface_api_key_here
# Alternative env var name (optional)
# Hugging_Face_Inference_API_KEY=your_huggingface_api_key_here
# Fallback 4: OpenAI Direct (optional)
OPENAI_API_KEY=your_openai_api_key_here
# Redis Configuration (for session & vector storage)
UPSTASH_REDIS_URL=your_redis_url_here
UPSTASH_REDIS_TOKEN=your_redis_token_here
# Optional: Session TTL (default: 2592000 = 30 days)
SESSION_TTL=2592000
# Optional: Public URLs for OpenRouter headers
NEXT_PUBLIC_CHATBOT_URL=https://yourdomain.com
NEXT_PUBLIC_SITE_URL=https://yourdomain.comnpm install @google/generative-ai @ai-sdk/openai @ai-sdk/groq ai @upstash/redis{
"dependencies": {
"@google/generative-ai": "^0.24.1",
"@ai-sdk/openai": "^3.0.12",
"@ai-sdk/groq": "^3.0.11",
"ai": "^6.0.42",
"@upstash/redis": "^1.36.1"
}
}your-project/
├── lib/
│ ├── ai.ts # AI provider logic with fallback chain
│ ├── redis.ts # Redis session & vector management
│ ├── rag.ts # RAG context retrieval (optional)
│ └── embeddings.ts # Vector embeddings generation (optional)
├── app/
│ └── api/
│ └── chat/
│ └── route.ts # Chat API endpoint
├── hooks/
│ └── use-chat.ts # React hook for chat functionality
└── components/
└── chatbot/
├── chatbot-widget.tsx
└── message-skeleton.tsximport { GoogleGenerativeAI } from "@google/generative-ai";
import { generateText, streamText } from "ai";
import { createOpenAI } from "@ai-sdk/openai";
import { createGroq } from "@ai-sdk/groq";
// Message type for AI SDK
interface Message {
role: "system" | "user" | "assistant";
content: string;
}
// AI Model fallback chain: Gemini (primary) → OpenRouter → Groq → Hugging Face → OpenAI (backup)
interface MessageContent {
text?: string;
content?: string;
message?: string;
}
export async function getAIResponse(
messages: Array<{
role: string;
content: string | unknown[] | MessageContent;
}>,
context?: string,
stream: boolean = true,
) {
const systemPrompt = `You are a helpful assistant. Be friendly, professional, and concise. Use the FAQ context to give accurate answers. If you don't know something, say so.`;
// Helper function to normalize a single message content to string (defined early for reuse)
const normalizeContentToString = (content: unknown): string => {
if (typeof content === "string") {
return content;
}
if (Array.isArray(content)) {
// Handle array format: [{ type: 'output_text', text: '...' }] or [{ type: 'input_text', text: '...' }]
return (content as unknown[])
.map((item: unknown) => {
if (typeof item === "string") return item;
if (item && typeof item === "object") {
const itemObj = item as {
text?: string;
content?: string;
message?: string;
type?: string;
};
// Extract text from various object formats
return itemObj.text || itemObj.content || itemObj.message || "";
}
return String(item || "");
})
.filter((text: string) => text.length > 0)
.join(" ");
}
if (content && typeof content === "object") {
// Handle object format: { text: '...' } or { content: '...' }
const contentObj = content as {
text?: string;
content?: string;
message?: string;
};
return contentObj.text || contentObj.content || contentObj.message || "";
}
return String(content || "");
};
// Normalize messages: ensure content is always a string
// CRITICAL: Normalize ALL messages before processing
const normalizedMessages: Message[] = messages
.slice(-6) // Last 6 messages for context
.map((msg) => {
// Force normalization - handle any format (string, array, object)
const content = normalizeContentToString(msg.content);
// Filter out empty messages
if (!content || content.trim().length === 0) {
return null;
}
// Ensure role is valid
const role =
msg.role === "assistant"
? "assistant"
: msg.role === "system"
? "system"
: "user";
return {
role: role as "system" | "user" | "assistant",
content: content.trim(),
};
})
.filter((msg): msg is Message => msg !== null); // Remove null messages
// Build full messages array - ensure system message is also normalized
const fullMessages: Message[] = [
{
role: "system",
content: systemPrompt + (context ? `\n\nFAQ Context:\n${context}` : ""),
},
...normalizedMessages,
];
// CRITICAL: Double-check that all messages in fullMessages have string content
// This is a safety net in case normalization failed above
for (let i = 0; i < fullMessages.length; i++) {
const msg = fullMessages[i];
if (typeof msg.content !== "string") {
console.warn(
`Message ${i} has non-string content, normalizing:`,
typeof msg.content,
Array.isArray(msg.content),
JSON.stringify(msg.content).substring(0, 100),
);
fullMessages[i] = {
...msg,
content: normalizeContentToString(msg.content),
};
}
}
// Final verification: ensure all messages are strings
const invalidMessages = fullMessages.filter(
(msg) => typeof msg.content !== "string",
);
if (invalidMessages.length > 0) {
console.error(
"ERROR: Some messages in fullMessages still have non-string content:",
invalidMessages,
);
// Force normalize all invalid messages
for (let i = 0; i < fullMessages.length; i++) {
if (typeof fullMessages[i].content !== "string") {
fullMessages[i] = {
...fullMessages[i],
content: normalizeContentToString(fullMessages[i].content),
};
}
}
}
// Debug: Log fullMessages to see what we're working with
console.log("fullMessages count:", fullMessages.length);
console.log(
"fullMessages content types:",
fullMessages.map((msg, i) => ({
index: i,
role: msg.role,
contentType: typeof msg.content,
isArray: Array.isArray(msg.content),
})),
);
// Helper function to prepare AI SDK messages (for OpenAI-compatible APIs)
// This ensures all content is normalized to strings, handling array formats from chat history
const prepareAIMessages = () => {
const aiMessages: Array<{
role: "system" | "user" | "assistant";
content: string;
}> = [];
// Process all messages, ensuring content is always a string
// Even though fullMessages should already be normalized, we double-check here for safety
for (const msg of fullMessages) {
// Normalize content to string (handles edge cases where normalization might have failed)
const contentStr = normalizeContentToString(msg.content);
// Only add non-empty messages
if (contentStr && contentStr.trim().length > 0) {
// Ensure role is valid
const role =
msg.role === "system"
? "system"
: msg.role === "assistant"
? "assistant"
: "user";
aiMessages.push({
role: role as "system" | "user" | "assistant",
content: contentStr.trim(),
});
}
}
return aiMessages;
};
// Primary: Gemini (reliable and free)
// Use stable model names from deprecation table (gemini-2.5-flash, gemini-2.5-pro)
const geminiModels = ["gemini-2.5-flash", "gemini-2.5-pro"];
let geminiRateLimited = false;
for (const modelName of geminiModels) {
try {
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_GEMINI_API_KEY!);
const model = genAI.getGenerativeModel({ model: modelName });
// Build prompt with system message and context
let prompt =
systemPrompt + (context ? `\n\nFAQ Context:\n${context}` : "") + "\n\n";
prompt += normalizedMessages
.map((m) => `${m.role === "user" ? "User" : "Assistant"}: ${m.content}`)
.join("\n\n");
const result = await model.generateContentStream(prompt);
// Convert Gemini stream to AI SDK format
if (stream) {
return {
textStream: (async function* () {
for await (const chunk of result.stream) {
const text = chunk.text();
if (text) yield text;
}
})(),
};
} else {
const response = await result.response;
return { text: response.text() };
}
} catch (error: unknown) {
// Check if it's a rate limit error (429) - skip remaining Gemini models
const errorMessage =
error instanceof Error ? error.message : String(error);
if (
errorMessage.includes("429") ||
errorMessage.includes("quota") ||
errorMessage.includes("Too Many Requests")
) {
console.log(
`Gemini model ${modelName} rate limited, skipping remaining Gemini models...`,
);
geminiRateLimited = true;
break; // Exit Gemini loop immediately
}
console.log(`Gemini model ${modelName} failed, trying next...`, error);
}
}
// If all Gemini models failed (or rate limited), try fallbacks
if (geminiRateLimited) {
console.log("Gemini rate limited, trying OpenRouter...");
} else {
console.log("All Gemini models failed, trying OpenRouter...");
}
// Fallback 1: OpenRouter GPT
// Support both OPENROUTER_API_KEY and OpenRouter_API_KEY env var names
const openRouterApiKey =
process.env.OPENROUTER_API_KEY || process.env.OpenRouter_API_KEY;
if (openRouterApiKey) {
try {
console.log("Trying OpenRouter GPT...");
const openaiClient = createOpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: openRouterApiKey,
headers: {
"HTTP-Referer":
process.env.NEXT_PUBLIC_CHATBOT_URL ||
process.env.NEXT_PUBLIC_SITE_URL ||
"https://yourdomain.com",
"X-Title": "Chatbot",
},
});
// Force use of Chat Completions API (not Responses API) by using .chat() method
const model = openaiClient.chat("openai/gpt-4o-mini");
const aiMessages = prepareAIMessages();
// CRITICAL: Final runtime validation - ensure ALL content is strings
// Create deep copies to prevent mutation and ensure string content
const validatedMessages = aiMessages.map((msg, index) => {
// Deep clone to prevent mutation
const clonedMsg = JSON.parse(JSON.stringify(msg));
// Ensure content is a string
if (typeof clonedMsg.content !== "string") {
console.error(
`ERROR: Message ${index} has non-string content:`,
typeof clonedMsg.content,
Array.isArray(clonedMsg.content),
clonedMsg,
);
// Force normalize
clonedMsg.content = normalizeContentToString(clonedMsg.content);
}
// Final check - ensure it's a string
if (typeof clonedMsg.content !== "string") {
console.error(
`CRITICAL: Message ${index} still has non-string content after normalization!`,
clonedMsg,
);
clonedMsg.content = String(clonedMsg.content || "");
}
return {
role: clonedMsg.role as "system" | "user" | "assistant",
content: String(clonedMsg.content), // Force string conversion
};
});
// Verify all messages have string content after validation
const hasArrayContent = validatedMessages.some(
(msg) => Array.isArray(msg.content) || typeof msg.content !== "string",
);
if (hasArrayContent) {
console.error(
"ERROR: Some messages still have non-string content after validation!",
validatedMessages.filter(
(msg) =>
Array.isArray(msg.content) || typeof msg.content !== "string",
),
);
throw new Error(
"Message normalization failed: some messages still have array content",
);
}
if (stream) {
const result = streamText({
model: model,
messages: validatedMessages as Array<{
role: "system" | "user" | "assistant";
content: string;
}>,
temperature: 0.7,
});
console.log("✅ OpenRouter GPT responding successfully");
return result;
} else {
const result = await generateText({
model: model,
messages: validatedMessages as Array<{
role: "system" | "user" | "assistant";
content: string;
}>,
temperature: 0.7,
});
console.log("✅ OpenRouter GPT responding successfully");
return result;
}
} catch (error) {
console.error("OpenRouter failed, trying Groq...", error);
}
}
// Fallback 2: Groq (fast and free tier available)
// Support both GROQ_API_KEY and Groq_Llama_API_KEY env var names
const groqApiKey = process.env.GROQ_API_KEY || process.env.Groq_Llama_API_KEY;
if (groqApiKey) {
try {
console.log("Trying Groq...");
const groq = createGroq({
apiKey: groqApiKey,
});
const aiMessages = prepareAIMessages();
if (stream) {
return streamText({
model: groq("llama-3.3-70b-versatile"), // Updated from llama-3.1-70b-versatile (deprecated Jan 24, 2025)
messages: aiMessages,
temperature: 0.7,
});
} else {
return await generateText({
model: groq("llama-3.3-70b-versatile"), // Updated from llama-3.1-70b-versatile (deprecated Jan 24, 2025)
messages: aiMessages,
temperature: 0.7,
});
}
} catch (error) {
console.error("Groq failed, trying Hugging Face...", error);
}
}
// Fallback 3: Hugging Face Inference API (trying multiple models)
// Support both HUGGING_FACE_API_KEY and Hugging_Face_Inference_API_KEY env var names
const huggingFaceApiKey =
process.env.HUGGING_FACE_API_KEY ||
process.env.Hugging_Face_Inference_API_KEY;
if (huggingFaceApiKey) {
// List of models to try in order (prioritize smaller/faster models first)
const models = [
// Small/fast models first (for speed)
"Qwen/Qwen3-0.6B", // 0.8B - very fast
"google/gemma-2b-it", // 2B - fast
"google/gemma-2b", // 2B - fast
"microsoft/phi-1_5", // Small and fast
"LiquidAI/LFM2.5-1.2B-Thinking", // 1B - fast
"LiquidAI/LFM2.5-1.2B-Instruct", // 1B - fast
// Medium models (good balance)
"meta-llama/Llama-3.1-8B-Instruct", // 8B - reliable
"tiiuae/falcon-7b-instruct", // 7B
"mistralai/Mistral-7B-Instruct-v0.3", // 7B
"HuggingFaceH4/zephyr-7b-beta", // 7B
"google/gemma-7b", // 7B
"NousResearch/Hermes-2-Pro-Mistral-7B", // 7B
"NousResearch/NousCoder-14B", // 14B
// Larger models (slower but better quality) - try last
"zai-org/GLM-4.7-Flash", // 31B - works but slower
"Qwen/Qwen3-Coder-30B-A3B-Instruct", // 31B
"openai/gpt-oss-20b", // 22B
"openai/gpt-oss-120b", // 120B - very slow
// Legacy fallbacks
"mistralai/Mistral-7B-Instruct-v0.2",
"tiiuae/falcon-7b",
"HuggingFaceH4/zephyr-7b-alpha",
];
const failedModels: string[] = [];
const aiMessages = prepareAIMessages();
for (const model of models) {
try {
console.log(`Trying Hugging Face model: ${model}...`);
// Use OpenAI-compatible router endpoint (like multi-ai-chatbot)
const response = await fetch(
"https://router.huggingface.co/v1/chat/completions",
{
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${huggingFaceApiKey}`,
},
body: JSON.stringify({
model: model,
messages: aiMessages,
max_tokens: 512,
temperature: 0.7,
}),
},
);
if (response.ok) {
const data = await response.json();
// Extract generated text from OpenAI-compatible format
let generatedText = "";
if (data?.choices?.[0]?.message?.content) {
generatedText = data.choices[0].message.content.trim();
} else if (data?.choices?.[0]?.text) {
generatedText = data.choices[0].text.trim();
} else if (data?.output?.[0]?.content?.[0]?.text) {
// Alternative format
generatedText = data.output[0].content[0].text.trim();
}
if (generatedText) {
console.log(`✅ Success with Hugging Face model: ${model}`);
if (stream) {
return {
textStream: (async function* () {
// Simulate streaming by yielding chunks
const words = generatedText.split(" ");
for (const word of words) {
yield word + " ";
// Small delay to simulate streaming
await new Promise((resolve) => setTimeout(resolve, 10));
}
})(),
};
} else {
return { text: generatedText };
}
}
}
// If this model failed, try next one
failedModels.push(`${model} (${response.status})`);
console.warn(
`${model} failed (${response.status}), trying next model...`,
);
} catch (error: unknown) {
failedModels.push(model);
console.warn(`${model} error:`, error);
continue;
}
}
// If all models failed
console.error(`All Hugging Face models failed: ${failedModels.join(", ")}`);
// Don't throw error here, continue to next fallback (OpenAI)
}
// Fallback 4: OpenAI Direct (if API key is available)
if (process.env.OPENAI_API_KEY) {
try {
console.log("Trying OpenAI direct...");
const openaiClient = createOpenAI({
apiKey: process.env.OPENAI_API_KEY!,
});
const aiMessages = prepareAIMessages();
if (stream) {
return streamText({
model: openaiClient("gpt-4o-mini"),
messages: aiMessages,
temperature: 0.7,
});
} else {
return await generateText({
model: openaiClient("gpt-4o-mini"),
messages: aiMessages,
temperature: 0.7,
});
}
} catch (error) {
console.error("OpenAI direct failed:", error);
}
}
throw new Error("All AI models failed");
}import { Redis } from "@upstash/redis";
export const redis = new Redis({
url: process.env.UPSTASH_REDIS_URL!,
token: process.env.UPSTASH_REDIS_TOKEN!,
});
// Message type for chat sessions
export interface ChatMessage {
role: "user" | "assistant";
content: string;
timestamp: number;
}
// Session type
export interface Session {
id: string;
messages: ChatMessage[];
createdAt: number;
updatedAt: number;
}
// FAQ metadata type
export interface FAQMetadata {
question: string;
answer: string;
}
// Session management
export async function getSession(sessionId: string): Promise<Session | null> {
const data = await redis.get(`chat:session:${sessionId}`);
if (!data) return null;
// Parse JSON if it's a string, otherwise use as-is
let session: Session;
if (typeof data === "string") {
try {
session = JSON.parse(data) as Session;
} catch (e) {
console.error("Failed to parse session data:", e);
return null;
}
} else {
session = data as Session;
}
// CRITICAL: Normalize message content to ensure it's always a string (handle legacy array format)
// This prevents array-formatted content from being passed to AI APIs
if (session.messages && Array.isArray(session.messages)) {
session.messages = session.messages.map((msg) => {
// If content is an array, normalize it to string
if (Array.isArray(msg.content)) {
const normalizedContent = (msg.content as unknown[])
.map((item: unknown) => {
if (typeof item === "string") return item;
if (item && typeof item === "object") {
const itemObj = item as {
text?: string;
content?: string;
message?: string;
};
return itemObj.text || itemObj.content || itemObj.message || "";
}
return String(item || "");
})
.filter((text: string) => text.length > 0)
.join(" ");
return { ...msg, content: normalizedContent };
}
// If content is an object, extract text
if (
msg.content &&
typeof msg.content === "object" &&
!Array.isArray(msg.content)
) {
const contentObj = msg.content as {
text?: string;
content?: string;
message?: string;
};
const normalizedContent =
contentObj.text ||
contentObj.content ||
contentObj.message ||
String(msg.content);
return { ...msg, content: normalizedContent };
}
// Already a string or convert to string
return {
...msg,
content:
typeof msg.content === "string"
? msg.content
: String(msg.content || ""),
};
});
}
return session;
}
export async function saveSession(
sessionId: string,
messages: ChatMessage[],
ttl: number = parseInt(process.env.SESSION_TTL || "2592000"),
): Promise<Session> {
const session: Session = {
id: sessionId,
messages,
createdAt: Date.now(),
updatedAt: Date.now(),
};
await redis.setex(`chat:session:${sessionId}`, ttl, JSON.stringify(session));
return session;
}
// Vector storage for RAG
export async function storeVector(
id: string,
vector: number[],
metadata: FAQMetadata,
): Promise<void> {
await redis.hset(`chat:vectors:${id}`, {
vector: JSON.stringify(vector),
metadata: JSON.stringify(metadata),
});
}
// Vector search result type
export interface VectorSearchResult {
similarity: number;
metadata: FAQMetadata;
}
export async function searchVectors(
queryVector: number[],
topK: number = 3,
): Promise<VectorSearchResult[]> {
// Simple cosine similarity search (for production, use a proper vector DB)
// This is a simplified version - for better performance, use Redis with RediSearch or Qdrant
const keys = await redis.keys("chat:vectors:*");
const results: VectorSearchResult[] = [];
for (const key of keys) {
try {
const data = await redis.hgetall(key);
if (data?.vector && data?.metadata) {
// Safely parse JSON with error handling
let vector: number[];
let metadata: FAQMetadata;
try {
const vectorStr =
typeof data.vector === "string"
? data.vector
: JSON.stringify(data.vector);
vector = JSON.parse(vectorStr) as number[];
} catch (e) {
console.error(`Failed to parse vector for ${key}:`, e);
continue;
}
try {
const metadataStr =
typeof data.metadata === "string"
? data.metadata
: JSON.stringify(data.metadata);
metadata = JSON.parse(metadataStr) as FAQMetadata;
} catch (e) {
console.error(`Failed to parse metadata for ${key}:`, e);
continue;
}
const similarity = cosineSimilarity(queryVector, vector);
results.push({
similarity,
metadata,
});
}
} catch (error) {
console.error(`Error processing vector ${key}:`, error);
continue;
}
}
return results.sort((a, b) => b.similarity - a.similarity).slice(0, topK);
}
function cosineSimilarity(a: number[], b: number[]): number {
if (a.length !== b.length) return 0;
let dotProduct = 0;
let normA = 0;
let normB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}import { NextRequest } from "next/server";
import { getSession, saveSession, type ChatMessage } from "@/lib/redis";
import { searchFAQ } from "@/lib/rag";
import { getAIResponse } from "@/lib/ai";
export const runtime = "edge"; // Use Edge Runtime for faster responses
export async function POST(req: NextRequest) {
try {
const body = (await req.json()) as { message?: string };
const { message } = body;
if (!message?.trim()) {
return new Response(JSON.stringify({ error: "Message required" }), {
status: 400,
headers: { "Content-Type": "application/json" },
});
}
// Get or create session from cookies
const cookieHeader = req.headers.get("cookie") || "";
const match = cookieHeader.match(/chatbot_session=([^;]+)/);
let sessionId: string = match?.[1] || "";
let session = sessionId ? await getSession(sessionId) : null;
if (!session) {
sessionId = `sess_${Date.now()}_${Math.random().toString(36).substring(2, 11)}`;
session = {
id: sessionId,
messages: [],
createdAt: Date.now(),
updatedAt: Date.now(),
};
}
// Add user message
const userMessage: ChatMessage = {
role: "user",
content: message.trim(),
timestamp: Date.now(),
};
session.messages.push(userMessage);
// RAG: Search for relevant FAQs
const context = await searchFAQ(message);
// Get AI response with streaming
const result = await getAIResponse(session.messages, context, true);
// Create streaming response
const stream = new ReadableStream({
async start(controller) {
let fullResponse = "";
try {
// AI SDK's streamText returns StreamTextResult with .textStream property
interface StreamResult {
textStream?: AsyncIterable<string>;
text?: string;
}
const streamResult = result as StreamResult;
const textStream = streamResult?.textStream;
if (
textStream &&
typeof textStream[Symbol.asyncIterator] === "function"
) {
// It's an async iterable - stream it
for await (const chunk of textStream) {
if (chunk) {
fullResponse += chunk;
controller.enqueue(
new TextEncoder().encode(
`data: ${JSON.stringify({ response: chunk })}\n\n`,
),
);
}
}
} else if (streamResult?.text) {
// Non-streaming response
fullResponse = streamResult.text;
controller.enqueue(
new TextEncoder().encode(
`data: ${JSON.stringify({ response: fullResponse })}\n\n`,
),
);
} else {
throw new Error("No textStream or text found in AI response");
}
// Save assistant message and session
const assistantMessage: ChatMessage = {
role: "assistant",
content: fullResponse,
timestamp: Date.now(),
};
session!.messages.push(assistantMessage);
session!.updatedAt = Date.now();
await saveSession(sessionId!, session!.messages);
controller.enqueue(new TextEncoder().encode("data: [DONE]\n\n"));
controller.close();
} catch (error) {
console.error("Streaming error:", error);
// Send error message to client
controller.enqueue(
new TextEncoder().encode(
`data: ${JSON.stringify({ error: error instanceof Error ? error.message : "Streaming failed" })}\n\n`,
),
);
controller.enqueue(new TextEncoder().encode("data: [DONE]\n\n"));
controller.close();
}
},
});
// Get origin for CORS
const origin = req.headers.get("origin");
const allowedOrigin = origin || "*";
// Set cookie if new session
const headers = new Headers({
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
Connection: "keep-alive",
"Access-Control-Allow-Origin": allowedOrigin,
"Access-Control-Allow-Credentials": "true",
"Access-Control-Allow-Methods": "POST, OPTIONS",
"Access-Control-Allow-Headers": "Content-Type, Cookie",
});
if (!match) {
headers.set(
"Set-Cookie",
`chatbot_session=${sessionId}; Path=/; HttpOnly; SameSite=Lax; Max-Age=2592000`,
);
}
return new Response(stream, { headers });
} catch (error) {
console.error("Chat error:", error);
return new Response(JSON.stringify({ error: "Internal server error" }), {
status: 500,
headers: { "Content-Type": "application/json" },
});
}
}
export async function OPTIONS(req: NextRequest) {
const origin = req.headers.get("origin");
const allowedOrigin = origin || "*";
return new Response(null, {
status: 200,
headers: {
"Access-Control-Allow-Origin": allowedOrigin,
"Access-Control-Allow-Credentials": "true",
"Access-Control-Allow-Methods": "POST, OPTIONS",
"Access-Control-Allow-Headers": "Content-Type, Cookie",
},
});
}import { useState, useCallback } from "react";
export interface Message {
role: "user" | "assistant";
content: string;
timestamp: number;
}
export function useChat() {
const [messages, setMessages] = useState<Message[]>([]);
const [isLoading, setIsLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const sendMessage = useCallback(
async (message: string) => {
if (!message.trim() || isLoading) return;
setIsLoading(true);
setError(null);
// Add user message immediately
const userMessage: Message = {
role: "user",
content: message.trim(),
timestamp: Date.now(),
};
setMessages((prev) => [...prev, userMessage]);
try {
const response = await fetch("/api/chat", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({ message }),
});
if (!response.ok) {
throw new Error("Failed to get response");
}
// Handle streaming response
const reader = response.body?.getReader();
const decoder = new TextDecoder();
let assistantMessage: Message = {
role: "assistant",
content: "",
timestamp: Date.now(),
};
if (reader) {
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split("\n");
for (const line of lines) {
if (line.startsWith("data: ")) {
const data = line.slice(6);
if (data === "[DONE]") {
break;
}
try {
const parsed = JSON.parse(data);
if (parsed.response) {
assistantMessage.content += parsed.response;
setMessages((prev) => {
const newMessages = [...prev];
const lastMessage = newMessages[newMessages.length - 1];
if (
lastMessage?.role === "assistant" &&
!lastMessage.timestamp
) {
newMessages[newMessages.length - 1] = assistantMessage;
} else {
newMessages.push(assistantMessage);
}
return newMessages;
});
} else if (parsed.error) {
throw new Error(parsed.error);
}
} catch (e) {
// Skip invalid JSON
}
}
}
}
}
} catch (err) {
const errorMessage =
err instanceof Error ? err.message : "An error occurred";
setError(errorMessage);
console.error("Chat error:", err);
} finally {
setIsLoading(false);
}
},
[isLoading],
);
const clearMessages = useCallback(() => {
setMessages([]);
setError(null);
}, []);
return {
messages,
isLoading,
error,
sendMessage,
clearMessages,
};
}Use this checklist when implementing in a new project:
- Review this guide completely
- Understand the fallback chain architecture
- Gather all required API keys
- Set up Redis instance (Upstash recommended)
- Create
.env.localfile - Add
GOOGLE_GEMINI_API_KEY - Add
OPENROUTER_API_KEY(recommended) - Add
GROQ_API_KEY(recommended) - Add
HUGGING_FACE_API_KEY(optional) - Add
OPENAI_API_KEY(optional) - Add
UPSTASH_REDIS_URL - Add
UPSTASH_REDIS_TOKEN - Add
SESSION_TTL(optional, default: 2592000) - Add
NEXT_PUBLIC_CHATBOT_URL(optional)
- Install
@google/generative-ai - Install
@ai-sdk/openai - Install
@ai-sdk/groq - Install
ai - Install
@upstash/redis
- Create
lib/ai.tswith complete implementation - Create
lib/redis.tswith message normalization - Create
app/api/chat/route.tswith streaming support - Create
hooks/use-chat.tsReact hook
-
normalizeContentToStringfunction exists inlib/ai.ts - Message normalization in
lib/redis.tsgetSessionfunction - Double-check loop in
lib/ai.tsforfullMessages - Final verification check before processing
- Deep cloning in OpenRouter validation
- Gemini primary implementation with fast skip
- OpenRouter fallback with deep validation
- Groq fallback implementation
- Hugging Face fallback with multiple models
- OpenAI direct fallback
- Rate limit detection (429 errors)
- Fast skip on Gemini rate limits
- Error logging for each provider
- Graceful fallback to next provider
- Test with Gemini (primary)
- Test fallback to OpenRouter (disable Gemini)
- Test fallback to Groq (disable Gemini & OpenRouter)
- Test fallback to Hugging Face (disable others)
- Test message normalization (array format)
- Test message normalization (object format)
- Test session persistence
- Test streaming responses
- Test error handling
const normalizeContentToString = (content: unknown): string => {
if (typeof content === "string") return content;
if (Array.isArray(content)) {
return (content as unknown[])
.map((item: unknown) => {
if (typeof item === "string") return item;
if (item && typeof item === "object") {
const itemObj = item as {
text?: string;
content?: string;
message?: string;
};
return itemObj.text || itemObj.content || itemObj.message || "";
}
return String(item || "");
})
.filter((text: string) => text.length > 0)
.join(" ");
}
if (content && typeof content === "object") {
const contentObj = content as {
text?: string;
content?: string;
message?: string;
};
return contentObj.text || contentObj.content || contentObj.message || "";
}
return String(content || "");
};catch (error: unknown) {
const errorMessage = error instanceof Error ? error.message : String(error);
if (errorMessage.includes('429') || errorMessage.includes('quota') || errorMessage.includes('Too Many Requests')) {
console.log(`Gemini model ${modelName} rate limited, skipping remaining Gemini models...`);
geminiRateLimited = true;
break; // Exit Gemini loop immediately
}
}const validatedMessages = aiMessages.map((msg, index) => {
const clonedMsg = JSON.parse(JSON.stringify(msg)); // Deep clone
if (typeof clonedMsg.content !== "string") {
clonedMsg.content = normalizeContentToString(clonedMsg.content);
}
if (typeof clonedMsg.content !== "string") {
clonedMsg.content = String(clonedMsg.content || "");
}
return {
role: clonedMsg.role as "system" | "user" | "assistant",
content: String(clonedMsg.content), // Force string conversion
};
});
// Verify all messages have string content
const hasArrayContent = validatedMessages.some(
(msg) => Array.isArray(msg.content) || typeof msg.content !== "string",
);
if (hasArrayContent) {
throw new Error(
"Message normalization failed: some messages still have array content",
);
}'use client';
import { useChat } from '@/hooks/use-chat';
import { useState } from 'react';
export default function Chatbot() {
const { messages, isLoading, error, sendMessage, clearMessages } = useChat();
const [input, setInput] = useState('');
const handleSubmit = (e: React.FormEvent) => {
e.preventDefault();
if (input.trim()) {
sendMessage(input);
setInput('');
}
};
return (
<div className="chatbot-container">
<div className="messages">
{messages.map((msg, idx) => (
<div key={idx} className={`message ${msg.role}`}>
<strong>{msg.role === 'user' ? 'You' : 'Assistant'}:</strong> {msg.content}
</div>
))}
</div>
{isLoading && <div>Loading...</div>}
{error && <div className="error">{error}</div>}
<form onSubmit={handleSubmit}>
<input
value={input}
onChange={(e) => setInput(e.target.value)}
placeholder="Type your message..."
disabled={isLoading}
/>
<button type="submit" disabled={isLoading}>
Send
</button>
</form>
<button onClick={clearMessages}>Clear</button>
</div>
);
}Solution:
- Ensure
lib/redis.tsnormalizes messages on retrieval - Check that
normalizeContentToStringis called on all messages - Verify deep cloning in OpenRouter validation
- Check all 5 normalization layers are implemented
Solution:
- Check if Gemini is rate-limited (should skip quickly)
- Verify Hugging Face models are ordered (smaller first)
- Check network connectivity
- Review console logs for specific delays
Solution:
- Check environment variables are set correctly
- Verify API keys are valid
- Check network connectivity
- Review console logs for specific error messages
- Test each provider individually
Solution:
- Verify Redis connection (UPSTASH_REDIS_URL and UPSTASH_REDIS_TOKEN)
- Check session cookie is being set
- Verify
saveSessionis being called after responses - Check Redis TTL settings
Edit systemPrompt in lib/ai.ts:
const systemPrompt = `Your custom system prompt here...`;Add to the models array in lib/ai.ts:
const models = [
// ... existing models
"your-model/name-here",
];Reorder the fallback sections in lib/ai.ts:
// Change the order of Fallback 1, 2, 3, 4Change temperature parameter in all AI calls:
temperature: 0.7, // Change to your preferred value (0.0 - 1.0)Set SESSION_TTL in .env.local:
SESSION_TTL=604800 # 7 days in secondsWhy it's critical:
- AI APIs expect string content, not arrays or objects
- Redis may store messages in different formats
- Prevents runtime errors and API failures
Implementation layers:
- Redis retrieval (
lib/redis.ts): Normalize when loading from Redis - Initial processing (
lib/ai.ts): Normalize all incoming messages - Double-check: Verify all messages are strings
- Runtime validation: Final check before sending to APIs (especially OpenRouter)
Why it's important:
- Reduces wait time when Gemini is rate-limited
- Immediately tries fallback providers
- Better user experience
Implementation:
if (errorMessage.includes("429") || errorMessage.includes("quota")) {
geminiRateLimited = true;
break; // Skip remaining Gemini models immediately
}Why it's needed:
- OpenRouter uses Chat Completions API which requires strict string format
- Prevents
invalid_unionerrors - Ensures message content is never mutated
Implementation:
const validatedMessages = aiMessages.map((msg) => {
const clonedMsg = JSON.parse(JSON.stringify(msg)); // Deep clone
// Ensure string content
// Final check
return { role, content: String(clonedMsg.content) };
});Why multiple models:
- Some models may be unavailable (410 Gone)
- Different models have different speeds
- Ensures reliability
Implementation:
- Try smaller models first (faster)
- Fall back to larger models if needed
- Continue to next provider if all fail
- Always normalize messages at multiple layers
- Use deep cloning for OpenRouter validation
- Log errors for debugging
- Handle edge cases (empty messages, invalid formats)
- Test fallback chain by disabling primary provider
- Monitor API usage to avoid rate limits
- Cache frequently asked questions for faster responses
- Gemini: https://ai.google.dev/
- OpenRouter: https://openrouter.ai/
- Groq: https://console.groq.com/
- Hugging Face: https://huggingface.co/settings/tokens
- OpenAI: https://platform.openai.com/api-keys
- Upstash Redis: https://console.upstash.com/
For issues or questions:
- Check console logs for specific error messages
- Verify all environment variables are set
- Test each provider individually
- Review this guide's troubleshooting section
- Ensure all 5 normalization layers are implemented
Last Updated: January 2025
Version: 1.0.0