GitHub - phoebeychen/26winter_LLMProject

Which models got it right? In our test case ("Roses are red, trucks are blue, and Seattle is grey right now"), the Sentence Transformer and OpenAI Large models correctly identified the core subjects, primarily ranking "Flowers" as the top category. OpenAI Small also performed well by identifying "Colors" as a highly relevant category due to the multiple color descriptors in the sentence.
Why did some fail? The GloVe 50d model failed by incorrectly prioritizing the "Food" category. This failure stems from its architecture: GloVe relies on global word co-occurrence statistics. In its training data (2 billion tweets), words like "red," "blue," or "Seattle" frequently co-occur in food-related contexts (e.g., "red wine," "blueberries," or Seattle restaurant reviews). Because the model lacks a structural understanding of the sentence, these high-frequency associations "noise up" the vector representation.
What does this reveal about word order? This experiment reveals that models using arithmetic averaging (like GloVe) are entirely "blind" to word order. They treat a sentence as a bag of words. Conversely, the success of the Transformers and OpenAI models demonstrates that capturing the syntactic structure—knowing that "Roses" is the subject and "red" is merely an attribute—is essential for accurate semantic retrieval.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.devcontainer		.devcontainer
.gitignore		.gitignore
README.md		README.md
README_STUDENT.md		README_STUDENT.md
glove_50d.png		glove_50d.png
miniproject_1_student.py		miniproject_1_student.py
openai_large_3072.png		openai_large_3072.png
openai_small_1536.png		openai_small_1536.png
requirements.txt		requirements.txt
sentence_transformers_384.png		sentence_transformers_384.png

Provide feedback