Arabic-Tweet-Classification-MARBERT-k5-fold

This repository contains a Jupyter Notebook designed for Arabic text classification using the MARBERT model, k-fold cross-validation was used to ensure robust performance evaluation. MARBERT is a state-of-the-art transformer-based model fine-tuned for tasks involving Arabic natural language processing (NLP).
you can check my paper to know more about the work and the results obtained.

Dataset

the dataset used to train the model can be obtained from the following link https://www.sciencedirect.com/science/article/pii/S2352340923009472#bib0001. it's collected from Twitter and has two classes Spam and Ham.
I have also added the dataset to this project for ease of use.

Prerequisites

Ensure you have the following dependencies installed before running the notebook:

Python 3.7+
Jupyter Notebook
Hugging Face Transformers
PyTorch
scikit-learn
pandas
numpy
matplotlib

Notes for excution

During the execution of this project, the dataset is accessed from Google Drive, and the trained model is saved back to Google Drive. To ensure the project runs correctly, update the path parameter to match your own Google Drive directory.

For example: path = '/content/drive/MyDrive/Colab/AR/'

Replace '/content/drive/MyDrive/Colab/AR/' with the path where your dataset and model will be stored in your Google Drive.

Model Evaluation

The performance of the trained MARBERT model was evaluated using 5-fold cross-validation to ensure robust and unbiased results. During cross-validation:

The dataset was split into 5 folds, with each fold used once as a validation set while the remaining folds were used for training.
Precision, recall, and F1-score were calculated for each fold.
At the end of the evaluation, the average results for both class 0 (Ham) and class 1 (Spam) were obtained.

Final 5-Fold Cross-Validation Results

Class 0 (Ham):

Precision: 0.9943
Recall: 0.9950
F1-score: 0.9947

Class 1 (Spam):

Precision: 0.9963
Recall: 0.9957
F1-score: 0.9960

Overall Metrics
Confusion Matrix:
[[11189 56]
[ 64 14851]]

Overall Accuracy: 0.9954

These results demonstrate the model's excellent performance in accurately classifying both Ham and Spam tweets, with a near-perfect accuracy and strong F1-scores for both classes.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Arabic Tweets Classification using MARBERT For Spam Detection - unpublished.pdf		Arabic Tweets Classification using MARBERT For Spam Detection - unpublished.pdf
Augmented_SpamHamTweets.xlsx		Augmented_SpamHamTweets.xlsx
LICENSE		LICENSE
NLP_Arabic_Classification_MARBERT_k5_fold.ipynb		NLP_Arabic_Classification_MARBERT_k5_fold.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic-Tweet-Classification-MARBERT-k5-fold

Dataset

Prerequisites

Notes for excution

Model Evaluation

Final 5-Fold Cross-Validation Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Arabic-Tweet-Classification-MARBERT-k5-fold

Dataset

Prerequisites

Notes for excution

Model Evaluation

Final 5-Fold Cross-Validation Results

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages