Comparative Analysis of the Online Behavior of Pro-Palestinians vs. Pro-Israelis on Reddit, Regarding the Israel-Palestine War (OCT 2023-MAY 2025)
This project provides a dashboard to analyze and compare Pro-Palestinian and Pro-Israel online content & comments based on various NLP metrics such as Toxicity Score, Sentiment Distribution, and more, while breaking these aspects to different topics and speech type. The dashboard is built using Streamlit and Plotly for interactive visualizations and allows between and within group comparisons on varius speech derived features.
The original dataset is available on this link.
The dashboard is available on this link (app might turn to sleep if wasn't used for a while).
The processed dataset, ready for analysis, and the original dataset snapshot used for this research are available in this link. You will also find there the precomputed visualizations that are loaded to the app, due to streamlit's resource constraint.
All dataprocess codefiles are available in this repository.
├── app.py # Streamlit dashboard app (based on processed data)
├── local_main.py # For developer -> create viz local cache
├── DatasetProcess.ipynb # Initial data cleaning and preparation
├── NLPFeatureExtraction.ipynb # Advanced NLP feature extraction (toxicity, sentiment, etc.)
├── Images/ # Dashboard screenshots for documentation
├── requirements.txt # Python dependencies
├── LICENSE # License information
└── README.md # Project overview and instructions- Sentiment Analysis and Emotional Speech: Analyze the sentiment distribution for different subtopics within Pro-Palestinian and Pro-Israel content.
- Toxicity and Profanity: Compare the Toxicity Score for Pro-Palestinian and Pro-Israel content, regarding different sub topics - conflict related.
- Content Representation: Visualize the proportion of Pro-Palestinian vs. Pro-Israel comments and their average scores (positive / negative responses).
- Factual vs. Emotional Speech: Compare the factual and emotional speech patterns for both groups using a heatmap.
Ensure you have Python installed on your machine. You will also need to install the required Python packages. You can do this by opening a venv and run:
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txtIn additions, be sure to open a local folder, where you'll keep the data zip file, with the app.py file and the requirements file.
Commands are written for powershell but can easily be adjusted to other terminals.
streamlit run app.pyAfter running the app, Streamlit will start a local web server and open a new tab in your default web browser, displaying the dashboard. If it doesn't automatically open, you can manually navigate to the URL shown in the terminal (usually http://localhost:8501).
NOTE: The local_main.py module is meant to create the vizualizations as a file that can be cached in Google Drive and then called by app.py, thus avoiding the immense computational resources needed to handle such a file and calculation by streamlit. Simply call:
python local_main.pyDon't forget to upload the zip file to Google Drive and update it's VIS_ZIP_GDRIVE_ID in app.py. Ensure it's permissions are public.
For users interested in classifying the political affiliation of social media comments, I recommend my related project: Israel-Palestine Political Affiliation Text Classification. This study focuses on building a scalable Deep Learning and Machine Learning pipeline to classify comments into Pro-Israel, Pro-Palestinian, and Undefined categories, starting with the unlabeled raw dataset from Kaggle. The classifier leverages advanced contextual embeddings using a fine tuned DistilBERT, automated tagging, and optimized classification models such as SVM and XGBoost. This project serves as a complementary tool for deeper classification and benchmarking in ideological discourse analysis, and the pipeline developed there is also used to classify the comment's stance, which is the key to this dashboard comparative analysis between the 2 groups (Pro-Israel vs. Pro-Palestine).




