This repository contains a Jupyter Notebook designed for Arabic text classification using the MARBERT model, k-fold cross-validation was used to ensure robust performance evaluation.
MARBERT is a state-of-the-art transformer-based model fine-tuned for tasks involving Arabic natural language processing (NLP).
you can check my paper to know more about the work and the results obtained.
I have also added the dataset to this project for ease of use.
Ensure you have the following dependencies installed before running the notebook:
Python 3.7+
Jupyter Notebook
Hugging Face Transformers
PyTorch
scikit-learn
pandas
numpy
matplotlib
For example: path = '/content/drive/MyDrive/Colab/AR/'
Replace '/content/drive/MyDrive/Colab/AR/' with the path where your dataset and model will be stored in your Google Drive.
The performance of the trained MARBERT model was evaluated using 5-fold cross-validation to ensure robust and unbiased results. During cross-validation:
- The dataset was split into 5 folds, with each fold used once as a validation set while the remaining folds were used for training.
- Precision, recall, and F1-score were calculated for each fold.
- At the end of the evaluation, the average results for both class 0 (Ham) and class 1 (Spam) were obtained.
- Precision: 0.9943
- Recall: 0.9950
- F1-score: 0.9947
- Precision: 0.9963
- Recall: 0.9957
- F1-score: 0.9960
Overall Metrics
Confusion Matrix:
[[11189 56]
[ 64 14851]]
Overall Accuracy: 0.9954
These results demonstrate the model's excellent performance in accurately classifying both Ham and Spam tweets, with a near-perfect accuracy and strong F1-scores for both classes.