The Sign Language Assistant is a Python-based application aimed at bridging the communication gap for individuals with hearing impairments. It provides two core functionalities:
- Voice to Sign: Converts spoken language to visual sign language using GIFs or alphabet signs.
- Sign Detection: Detects and interprets real-time sign gestures from webcam input using a trained ML model and MediaPipe.
The app supports 35 classes focused on Indian Sign Language (ISL) using techniques from Computer Vision, Machine Learning, and Natural Language Processing.
- ๐ธ Data Collection: Captures 100 images/class for 35 signs using webcam (in 10-image batches).
- ๐งช Data Augmentation: Enhances dataset using brightness/contrast, blur, flips (3x augmentations).
- ๐งฌ Feature Extraction: Uses MediaPipe Holistic landmarks (weighted hands/face/pose) for model input.
- ๐ง Model Training: Trains a
RandomForestClassifierwith hyperparameter tuning. - ๐ฃ๏ธ Voice to Sign: Google Speech Recognition to ISL GIFs or letters.
- ๐ค Sign Detection: Real-time prediction with buffer-based voting and Gemini API interpretation.
- ๐ฅ๏ธ UI: Built with Tkinter, interactive and easy to use.
-
Python: Version 3.8+
-
Webcam: Required for real-time tasks
-
Gemini API: Must be running at
http://localhost:8000/geminior update in code -
Image/GIF Assets:
logo.pngletters/folder witha.jpg...z.jpg,empty.jpgISL_Gifs/with predefined phrases (e.g.,good morning.gif)
pip install -r requirements.txtsign-language-assistant/
โโโ data/ # Collected raw sign images
โโโ augmented_data/ # Augmented images
โโโ letters/ # Alphabet signs (JPGs)
โโโ ISL_Gifs/ # Phrase signs (GIFs)
โโโ logo.png
โโโ data.pickle # Extracted features
โโโ model.p # Trained model
โโโ confusion_matrix.png
โโโ feature_importances.png
โโโ image_gallery.html
โโโ *.py # Python scripts
โโโ requirements.txt
python main.pyProvides 3 GUI options: Voice To Sign, Sign Detection, Exit
- Converts spoken phrases โ ISL GIFs or letters
- Displays gallery if no phrase match
-
Live gesture prediction via webcam
-
Buffered voting (15 frames, 60% confidence)
-
Press:
Gโ Interpret buffered sequence via Gemini APIBโ BackQโ Quit
python data_collection.pyCaptures 100 images per sign (10/batch)
python data_augmentation.pyGenerates 3 augmented images per original
python feature_extraction.pyUses MediaPipe Holistic landmarks, saves as data.pickle
python model_training.py- Trains RandomForest with
GridSearchCV - Outputs
model.p, plots
- Stored in
./data/<class>or./augmented_data/<class> - JPG format
- Weights: hand = 1.0, face = 0.1, pose = 0.3
- Normalized using min(x, y)
- RandomForestClassifier +
StandardScaler - Parameters:
n_estimators,max_depth,min_samples_leaf,max_features - Outputs:
confusion_matrix.png,feature_importances.png
- Uses
speech_recognitionwith Google API - Levenshtein threshold = 0.4 for matching
- Real-time with OpenCV & MediaPipe
- 15-frame buffer, 60% voting threshold
- Gemini integration for phrase interpretation
- ๐ง Only 35 classes supported currently
- ๐ Speech recognition needs internet
- ๐ง No dynamic sequence classification yet
- ๐ Real-time inference may lag on low-end systems
- Expand sign vocabulary & phrases
- Add offline speech recognition
- Optimize inference on low-resource devices
- Dynamic sign sequences and sentence formation
- Improved accessibility & GUI UX
- Fork the repo
- Create a branch:
git checkout -b feature-name - Commit:
git commit -m 'Add feature' - Push:
git push origin feature-name - Open a Pull Request
MIT License โ see LICENSE
