An Artificial Intelligence (AI) project for course CS5100 at Northeastern University
In this project we developed machine learning models that use movie reviews by users to classify the sentiment of reviews.
- Extract
dataset/raw_reviews.zipanddataset/dataset.zipinmaindirectory. - Execute
python NBtrain.py '../main/train'and thenpython NBtest.py '../main/test'for the main implementation - Execute
python NaiveBayes_bigrams.py' andpython NaiveBayes_TFIDF.py' respectively
- Execute
python review_polarity.pyandpython review_polarity.py
We have extracted our custom datasets by implementing the DFS crawler. Refer dataset_generation/ for the code and dataset/ for the extracted dataset.
| n = 12000 | Predicted: Positive | Predicted: Negative |
|---|---|---|
| Actual: Positive | 4803 | 5348 |
| Actual: Negative | 1197 | 652 |
| True Positives: 10,151 |
F1 Score: 0.8242 | Accuracy: 84.59%
Additional Datasets compatible with the project: Large Movie Review Dataset