A real-time computer vision application that combines object detection (YOLOv8), face analysis, and hand tracking with generative AI (Google Gemma) to understand and describe the scene.
- Object Detection: Uses YOLOv8 to detect objects in real-time.
- Face Analysis: Detects faces and analyzes attributes.
- Hand Tracking: Tracks hand movements.
- Scene Understanding: Uses Google's Gemma model to generate creative, sci-fi inspired descriptions of the detected scene.
- Temporal Smoothing: Smooths detection results over time for a stable visualization.
- Python 3.8+
- Webcam
-
Clone the repository (or download the files).
-
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up API Key:
- Get a Google AI API key from Google AI Studio.
- Create a
.envfile in the root directory. - Add your key:
GOOGLE_API_KEY=your_api_key_here
Run the main application:
python main.pyq: Quit the application.e: Ask Gemma to explain the current scene.s: Toggle temporal smoothing.r: Register the current face (saves toface_db/).
main.py: Entry point of the application.detection/: Contains modules for object detection, face analysis, and hand tracking.temporal/: Logic for smoothing detections over time.utils/: Configuration and drawing utilities.llm_helper.py: Interface for interacting with the Google Gemma API.