All the software code and data are included in this folder. Please find the following directions for navigating through the code and data files.
All the code files that are used to implement sentiment analysis models are included in this folder.
Baseline_Models_and_Distilbert.ipynb: Includes EDAs of the dataset, data preprocessing with TfidfVectorizer, baseline models (i.e., Logistic Regression, Naive Bayes Multinomial), Distilbert TransformersDistilroberta Model.ipynb: Includes the distilled version of the RoBERTa-base modelMulti-head attention.ipynb: Includes the multi-head attention transformer model using Keras package. Since it is our best model, shap value analysis is also conducted for the model resultRoberta_Model.ipynb: Includes the RoBERTa modelTokenizer as Feature Extractor.ipynb: Includes data preprocessing with distilbert tokenizer and a logistic regression modeldistilbertroberta_tokenizer.py: Includes the exclusive tokenizer for the distilbertroberta modelroberta_tokenizer.py: Includes the exclusive tokenizer for the roberta modeltokenizer_class.py: Includes the tokenizer function for text preprocessing
This folder contains the code work for aspect-based sentiment analysis
PyABSA.ipynb: Includes the aspect-based sentiment analysis performed by PyABSA packageSpacy_Aspect_Classifier.ipynb: Includes the aspect-based sentiment analysis performed by Spacy Aspect Classifier
Reviews.csv: Raw dataset (up to 2011)reviews_df.csv: Processed dataset used for overall sentiment analysiskpc analysis by gpt.csv: KPC result generated by Chatgpt (Zero Code)sentiment result by gpt with TextBlob.xlsx: Sentiment result by GPT generated with TextBlobshap_values_3.npy: Shap value analysis based on the results from the Multi-head attention modelunseen.csv: Dataset in 2012 for model testingtop_product_1.csv: The dataset for the top 1 most popular product including overall sentiment prediction results (Used for aspect-based sentiment analysis)top_product_2.csv: The dataset for the top 2 most popular product including overall sentiment prediction results (Used for aspect-based sentiment analysis)