SentiMix-Sentiment-Analysis

The evolution of social media texts such as blogs, micro-blogs (e.g., Twitter), and chats (e.g., WhatsApp and Facebook messages) has created many new opportunities for information access and language technology, but it has also posed many new challenges making it one of the current prime research areas. Although current language technologies are primarily built for English, non-native English speakers combine English and other languages when they use social media. In fact, statistics show that half of the messages on Twitter are in a language other than English. This evidence suggests that other languages, including multilinguiality and code-mixing, need to be considered by the NLP community. Code-mixing poses several unseen difficulties to NLP tasks such as word-level language identification, part-of-speech tagging, dependency parsing, machine translation and semantic processing. Conventional NLP systems heavily rely on monolingual resources to address code-mixed text, which limit them to properly handle issues like English-based phonetic typing, word-level code-mixing, and others. The next two phrases are examples of code-mixing in Spanglish and Hinglish. For the Spanglish example, in addition to the code-mixing at the sentence level, the word pushes conjugates the English word push according to the grammar rules in Spanish, which shows that code-mixing can also happen at the word level. Better to add more details on the Hinglish example In the Hinglish example only one English word enjoy has been used, but more noticeably for the Hindi words - instead of using Devanagari script, English phonetic typing is a popular practice in India. The task is to predict the sentiment of a given code-mixed tweet. The sentiment labels are positive, negative, or neutral, and the code-mixed languages will be English-Hindi and English-Spanish. Besides the sentiment labels, we will also provide the language labels at the word level. The word-level language tags are en (English), spa (Spanish), hi (Hindi), mixed, and univ (e.g., symbols, @ mentions, hashtags). Efficiency will be measured in terms of Precision, Recall, and F-measure.

The task required data cleaning of all sorts and was completed with 67% accuracy using the SVM Machine Learning Model.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
SentiMix-Sentiment-Analysis.ipynb		SentiMix-Sentiment-Analysis.ipynb
data.csv		data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SentiMix-Sentiment-Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SentiMix-Sentiment-Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages