A multimodal emotion recognition system that detects human emotions from both text and pre-recorded audio using classical machine learning techniques, with an interactive Streamlit web interface.
EmotionSense analyzes input from two modalities and predicts one of four emotions:
| Emotion | 😊 Happy | 😢 Sad | 😠 Angry | 😐 Neutral |
|---|
- Text input — uses sentence embeddings to capture semantic meaning
- Audio input — extracts MFCC (Mel-Frequency Cepstral Coefficients) features from uploaded
.wavfiles - Both modalities feed into trained classifiers (Logistic Regression, SVM) to predict the emotion
Trained on the RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset — a validated, widely-used benchmark for emotion recognition research containing 24 professional actors expressing emotions in controlled conditions.
| Component | Technology |
|---|---|
| Language | Python |
| Audio features | MFCC (via librosa) |
| Text features | Sentence embeddings |
| Classifiers | Logistic Regression, SVM |
| Interface | Streamlit |
EmotionSense/
├── data/ # RAVDESS dataset files
├── models/ # Trained classifier models
├── src/ # Feature extraction and model training scripts
├── app.py # Streamlit web interface
├── requirements.txt # Dependencies
└── .gitignore
1. Clone the repo
git clone https://github.com/inx-sha/EmotionSense.git
cd EmotionSense2. Install dependencies
pip install -r requirements.txt3. Launch the Streamlit app
streamlit run app.py4. Use the app
- Type text into the text box or upload a
.wavaudio file - Click Predict to get the detected emotion
Inshaf Ahamed
BS Computer Engineering — PAF-IAST, Pakistan