Email Spam Shield 🛡️

A machine learning application that detects spam emails using hybrid features and multiple classifiers.

📋 Overview

This project implements a spam email classification system using multiple machine learning models and feature extraction techniques. The application provides a user-friendly interface built with Streamlit for analyzing emails and determining whether they are spam or legitimate.

Features

Multiple Feature Extraction Techniques:
- TF-IDF Vectorization
- Word2Vec Embeddings
- Hybrid feature combination
Multiple Classifiers:
- Logistic Regression
- Naive Bayes
- Support Vector Machine (SVM)
- Random Forest
Interactive UI:
- Real-time email analysis
- Visual representation of results
- Keyword analysis
- Performance metrics visualization

🚀 Getting Started

Prerequisites

Python 3.9+
pip

Installation

Clone the repository:

git clone https://github.com/your-username/email-spam-shield.git
cd email-spam-shield

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate

Install the required packages:
```
pip install -r requirements.txt
```

Usage

To run the Streamlit application:
```
python3 -m streamlit run app.py
```
To retrain the models using the Jupyter notebook:
```
jupyter notebook spamemailclassification.ipynb
```

🧠 How It Works

The spam classification system works in three main steps:

Text Preprocessing:
- Converting to lowercase
- Removing special characters
- Tokenizing into words
- Removing stop words
- Lemmatizing words
Feature Extraction:
- TF-IDF: Converts text into numerical values based on word frequency
- Word2Vec: Neural network model that learns word associations
- Hybrid Features: Combines both methods for enhanced performance
Classification:
- Multiple machine learning models analyze the features
- The model with the best F1-score is selected as the final classifier

📊 Performance

The system has been tested on the Enron email dataset, with the following performance metrics:

Model	Accuracy	Precision	Recall	F1-Score
Logistic Regression	~92%	~89%	~85%	~87%
Naive Bayes	~88%	~82%	~91%	~86%
SVM	~93%	~90%	~87%	~88%
Random Forest	~91%	~88%	~84%	~86%

Note: Actual performance may vary based on the most recent training.

📚 Project Structure

email-spam-shield/
├── app.py                       # Streamlit application
├── spamemailclassification.ipynb # Training notebook
├── requirements.txt             # Dependencies
├── best_spam_classifier.pkl     # Best model
├── tfidf_vectorizer.pkl         # TF-IDF model
├── scaler.pkl                   # Feature scaler
├── w2v_model.pkl                # Word2Vec model
└── enrondataset.csv             # Dataset

🤝 Contributing

Contributions, issues, and feature requests are welcome! Feel free to check the issues page.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

Enron Email Dataset
Streamlit for the interactive UI framework
NLTK for natural language processing tools
Scikit-learn for machine learning models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Email Spam Shield 🛡️

📋 Overview

Features

🚀 Getting Started

Prerequisites

Installation

Usage

🧠 How It Works

📊 Performance

📚 Project Structure

🤝 Contributing

📜 License

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.devcontainer		.devcontainer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
best_spam_classifier.pkl		best_spam_classifier.pkl
enrondataset.csv		enrondataset.csv
requirements.txt		requirements.txt
scaler.pkl		scaler.pkl
spamemailclassification.ipynb		spamemailclassification.ipynb
tfidf_vectorizer.pkl		tfidf_vectorizer.pkl
w2v_model.pkl		w2v_model.pkl

Folders and files

Latest commit

History

Repository files navigation

Email Spam Shield 🛡️

📋 Overview

Features

🚀 Getting Started

Prerequisites

Installation

Usage

🧠 How It Works

📊 Performance

📚 Project Structure

🤝 Contributing

📜 License

🙏 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages