Skip to content

snowholt/smart-vision-gemma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Smart Vision Gemma

A real-time computer vision application that combines object detection (YOLOv8), face analysis, and hand tracking with generative AI (Google Gemma) to understand and describe the scene.

Features

  • Object Detection: Uses YOLOv8 to detect objects in real-time.
  • Face Analysis: Detects faces and analyzes attributes.
  • Hand Tracking: Tracks hand movements.
  • Scene Understanding: Uses Google's Gemma model to generate creative, sci-fi inspired descriptions of the detected scene.
  • Temporal Smoothing: Smooths detection results over time for a stable visualization.

Prerequisites

  • Python 3.8+
  • Webcam

Installation

  1. Clone the repository (or download the files).

  2. Create a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up API Key:

    • Get a Google AI API key from Google AI Studio.
    • Create a .env file in the root directory.
    • Add your key:
      GOOGLE_API_KEY=your_api_key_here

Usage

Run the main application:

python main.py

Controls

  • q: Quit the application.
  • e: Ask Gemma to explain the current scene.
  • s: Toggle temporal smoothing.
  • r: Register the current face (saves to face_db/).

Project Structure

  • main.py: Entry point of the application.
  • detection/: Contains modules for object detection, face analysis, and hand tracking.
  • temporal/: Logic for smoothing detections over time.
  • utils/: Configuration and drawing utilities.
  • llm_helper.py: Interface for interacting with the Google Gemma API.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages