Skip to content

janecww/bt5151_group2_amazon-food-reviews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Roadmap of Codespace for BT5151 Group 2 Project Proposal

All the software code and data are included in this folder. Please find the following directions for navigating through the code and data files.

Code

Overall Sentiment Models

All the code files that are used to implement sentiment analysis models are included in this folder.

  • Baseline_Models_and_Distilbert.ipynb : Includes EDAs of the dataset, data preprocessing with TfidfVectorizer, baseline models (i.e., Logistic Regression, Naive Bayes Multinomial), Distilbert Transformers
  • Distilroberta Model.ipynb: Includes the distilled version of the RoBERTa-base model
  • Multi-head attention.ipynb: Includes the multi-head attention transformer model using Keras package. Since it is our best model, shap value analysis is also conducted for the model result
  • Roberta_Model.ipynb: Includes the RoBERTa model
  • Tokenizer as Feature Extractor.ipynb: Includes data preprocessing with distilbert tokenizer and a logistic regression model
  • distilbertroberta_tokenizer.py: Includes the exclusive tokenizer for the distilbertroberta model
  • roberta_tokenizer.py: Includes the exclusive tokenizer for the roberta model
  • tokenizer_class.py: Includes the tokenizer function for text preprocessing

Aspect-based Sentiment Analysis

This folder contains the code work for aspect-based sentiment analysis

  • PyABSA.ipynb: Includes the aspect-based sentiment analysis performed by PyABSA package
  • Spacy_Aspect_Classifier.ipynb: Includes the aspect-based sentiment analysis performed by Spacy Aspect Classifier

Data

  • Reviews.csv: Raw dataset (up to 2011)
  • reviews_df.csv: Processed dataset used for overall sentiment analysis
  • kpc analysis by gpt.csv: KPC result generated by Chatgpt (Zero Code)
  • sentiment result by gpt with TextBlob.xlsx: Sentiment result by GPT generated with TextBlob
  • shap_values_3.npy: Shap value analysis based on the results from the Multi-head attention model
  • unseen.csv: Dataset in 2012 for model testing
  • top_product_1.csv: The dataset for the top 1 most popular product including overall sentiment prediction results (Used for aspect-based sentiment analysis)
  • top_product_2.csv: The dataset for the top 2 most popular product including overall sentiment prediction results (Used for aspect-based sentiment analysis)

About

The main objective of sentiment analysis is the use of natural language processing (NLP) techniques to gain insights from the sentiments of customer reviews, in this case from the fine food products listed on Amazon.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors