Hespress Comment Sentiment Analysis

This project performs sentiment analysis on comments scraped from Hespress articles. It uses a big data pipeline consisting of Apache Kafka, Apache Spark, HDFS, and MongoDB to process and store the data.

Architecture

The project follows a hybrid batch and real-time processing architecture:

Data Source (Hesspress): Comments are scraped from Hespress articles using a custom scraper.
Data Ingestion (Kafka): Scraped comments are streamed into a Kafka topic.
Batch Processing (Spark):
- Spark reads comments from the Kafka topic in batches.
- Preprocessing steps (cleaning, normalization) are applied.
- Sentiment is predicted using a pre-trained deep learning model.
- Processed comments, including sentiment, are stored in MongoDB.
Storage (MongoDB): MongoDB stores both batch and real-time processed comments.
Persistent Storage (HDFS): The raw comments ingested from Kafka are stored on HDFS for data durability and potential replay/reprocessing.

Screenshots

Project Structure

hespress-comments-analysis/
├── config/                   # Configuration files
│   ├── kafka_config.py
│   └── mongodb_config.py
├── models/                   # Data models
│   ├── comment.py
│   ├── sentiment_model.h5
│   ├── tokenizer.json
│   └── label_encoder.pkl
├── processors/               # Data processing logic
│   ├── batch_processor.py
│   └── spark_processor.py
├── storage/                  # Data storage handlers
│   ├── hdfs_handler.py
│   ├── kafka_handler.py
│   └── mongodb_handler.py
├── utils/                    # Utility functions
│   └── scrapper.py
│   └── sentiments_processor.py
├── dashboard/                # Flask dashboard
└── main.py                   # Main application entry point
└── requirements.txt          # Project dependencies
└── README.md                 # This file

Getting Started

Prerequisites

Docker: Ensure you have Docker installed on your system. Install Docker
Docker Compose: Ensure you have Docker Compose installed. Install Docker Compose

Installation (Docker based)

Clone the repository:

git clone https://github.com/abdellatif-laghjaj/hespress-comments-analysis
cd hespress-comments-analysis

Generate Model Files:

Before building the Docker image, you need to generate the sentiment analysis model files.
Run the Notebook: Execute the notebook model_training_notebook.ipynb that trains and saves the sentiment analysis model, tokenizer, and label encoder. and also don't forget to use the attached CSV file as the dataset.
Place Model Files: Ensure that the generated files (sentiment_model.h5, tokenizer.json, label_encoder.pkl) are placed in the model/ directory.

Build and Run with Docker Compose:

Navigate to the project root directory in your terminal (where docker-compose.yml is located).
Build the Docker image:

docker-compose build

Run the entire application using Docker Compose:

docker-compose up

Accessing the Dashboard

http://localhost:5001/

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hespress Comment Sentiment Analysis

Architecture

Screenshots

Project Structure

Getting Started

Prerequisites

Installation (Docker based)

Contributing

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
config		config
dashboard		dashboard
models		models
processors		processors
storage		storage
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
Sentiment_Anaysis.csv		Sentiment_Anaysis.csv
docker-compose.yml		docker-compose.yml
hadoop.env		hadoop.env
hespress-comment-analysis.ipynb		hespress-comment-analysis.ipynb
img_1.PNG		img_1.PNG
img_2.PNG		img_2.PNG
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Hespress Comment Sentiment Analysis

Architecture

Screenshots

Project Structure

Getting Started

Prerequisites

Installation (Docker based)

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages