GitHub - rkechols/hotel-review-connotation: Code for analyzing connotation of certain words in the context of hotel reviews · GitHub

This repository was archived by the owner on Oct 6, 2024. It is now read-only.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
.gitignore		.gitignore
README.md		README.md
automated_final.py		automated_final.py
final_analysis.py		final_analysis.py
manual_scoring.py		manual_scoring.py
print_context.py		print_context.py
sample_random_lemmas.py		sample_random_lemmas.py
search_scores.py		search_scores.py
simple_token.py		simple_token.py
split_csv_to_txt.py		split_csv_to_txt.py
stats.py		stats.py
tf_idf.py		tf_idf.py
tf_idf_slope.py		tf_idf_slope.py
tokenize_english.py		tokenize_english.py
util.py		util.py

Repository files navigation

Steps to run full analysis

Part 1: Prep

split_csv_to_txt.py to split the original .csv file into 5 .txt files (one for each rating level).
tokenize_english.py to run spaCy tokenization and lemmatization.
sample_random_lemmas.py to select random lemmas for the control group. Other specific lemmas for study should also be selected here.
(optional) stats.py to show info about how many tokens there are.

Part 2: Automated analysis

tf_idf.py to calculate all TF-IDF scores.
tf_idf_slope.py to calculate all slopes of TF-IDF scores relative to number of stars.
automated_final.py to calculate all final automated scores for eac
search_scores.py to grab scores for the targeted words.

Part 3: Manual analysis

print_context.py to find all instances of each lemma in context.
manual_scoring.py to manually assign scores to sampled context instances.

Part 4: Correlation between Automated and Manual scores

final_analysis.py to run a regression analysis on each of the groups.

About

Code for analyzing connotation of certain words in the context of hotel reviews

Report repository

Releases

No releases published

Packages

Contributors

Languages

Python 100.0%