ThirdEye is a Multimodal AI Assistant integrated into WhatsApp. Unlike basic chatbots, ThirdEye has "Long-Term Memory," acts as a "Vision Expert," searches the "Live Internet," and can talk back in audio.
It solves the problem of stateless AI by using MongoDB Cloud to remember user details (names, faces, context) permanently, even after server restarts.
- The bot remembers who you are.
- Stores user details and context in a cloud database.
- Example: If you send a photo of a person and say "This is Rahul," the bot will remember Rahul forever.
- Connected to the real world using DuckDuckGo Search.
- Can answer queries like "Current Gold Rate," "Bangalore Weather," or "Latest News."
- Send any image, and the bot analyzes it.
- Can describe scenes, read handwritten text, or identify objects.
- Send voice notes -> Bot listens and replies in text.
- Bot can also reply with Audio (TTS) in the same language and tone.
- Send a PDF (Resume, Invoice, Book).
- Ask questions like "What is the total amount in this bill?" and get instant answers.
- Brain: Google Gemini Pro & Flash (Generative AI)
- Backend: Python (FastAPI)
- Database: MongoDB Atlas (Cloud NoSQL)
- Messaging: Twilio API (WhatsApp)
- Hosting: Render (Cloud Server)
- Tools:
gTTS(Text-to-Speech),PyPDF(PDF Parsing),DuckDuckGo(Search)
- User sends a message (Text, Image, or Audio) on WhatsApp.
- Twilio receives the message and forwards it to the Render Server.
- FastAPI processes the input:
- Checks MongoDB for past memories.
- If needed, searches the Internet.
- Sends context to Google Gemini AI.
- Response is generated (Text or Audio) and sent back to WhatsApp.
- Adding Reminder/Alarm features.
- Google Calendar Integration.
- Multi-user group chat analysis.
Created with ❤️ by Rakesh Raushan