A high-performance, hands-free, bilingual AI voice assistant powered by Google's Gemini AI and Bytez Cloud ASR. Designed to be fast, responsive, and intelligent—just like JARVIS.
- Hands-Free VAD: No keys to press. Nova automatically detects when you start and stop speaking.
- 🇮🇳 Bilingual Support: Perfectly understands and speaks both Hindi and English.
- Turbo Brain: Powered by Google Gemini 2.5 Flash for near-instant thinking.
- Cloud ASR: Uses Bytez SDK with
Whisper-Large-V3for superior speech recognition accuracy. - Premium Neural Voice: Uses
edge-tts(Swara for Hindi, Ava for English) for human-like speech. - Parallel Execution: Nova performs tasks (opening apps, etc.) while she is speaking to you.
- Direct-to-Memory: No slow temporary files (
.wavor.mp3). Everything is processed in RAM for maximum speed. - Vision: Capture photos instantly from your webcam on command.
- Turbo Typing: Near-instant text automation with support for Hindi characters.
- Smart App Control: Opens any software using Windows Start Menu search integration.
- Python 3.13+
- uv (Fast Python package manager)
- Microphone & Webcam
- Google Gemini API Key
- Bytez API Key
-
Clone the repository
git clone https://github.com/Pratham-Prog861/jarvis-clone.git cd jarvis-clone -
Install dependencies
uv sync
-
Set up Environment Variables Create a
.envfile in the root directory:GEMINI_API_KEY=your_gemini_key BYTEZ_API_KEY=your_bytez_key
-
Run Nova
uv run main.py
- Start Nova: Run the script and wait for "Nova is online".
- Just Talk: Simply start speaking in Hindi or English.
- Hands-Free: Nova will detect your voice, process the request, and respond automatically.
- Bilingual: "नमस्ते नोवा, तुम कैसी हो?" (Namaste Nova, how are you?)
- App Control: "Open Visual Studio Code" or "Chrome kholo"
- Automation: "Type 'Hello World' in Hindi" or "Likho 'Namaste Bharat'"
- Web: "Search for the latest AI news" or "Open youtube.com"
- Camera: "Capture a photo" or "Meri photo khicho"
jarvis-clone/
├── core/
│ ├── agent.py # Brain (Gemini 1.5 Flash)
│ ├── emotion.py # Sentiment Analysis
│ ├── memory.py # Conversation History
│ └── router.py # Action Orchestrator
├── voice/
│ ├── listener.py # Cloud ASR (Bytez + Whisper V3)
│ └── speaker.py # Neural TTS (Edge-TTS)
├── tools/
│ ├── browser.py # Web & Search Tools
│ ├── camera.py # OpenCV Photo Capture
│ ├── system.py # Start Menu App Launcher
│ └── writer.py # Clipboard-based Turbo Typing
├── prompts/
│ └── nova_system.txt # Personality & Logic Rules
├── main.py # Entry Point (Parallel Loop)
└── pyproject.toml # Modern Dependency Managementgoogle-genai: AI Brainbytez: Cloud Speech Recognitionedge-tts: Premium Neural Voiceopencv-python: Camera functionalitypyautogui&pyperclip: System automationpygame: Audio playback (In-memory)sounddevice: Voice Activity Detection
MIT License - Feel free to use and modify!
Made by Pratham