Large Language Models Meet Emotion Recognition: A Survey

This is the summation of all the methods, datasets, and other survey mentioned in our survey 'Large Language Models Meet Emotion Recognition: A Survey' 🔥. Any problems, please contact shouyuntao@stu.xjtu.edu.cn. Any other interesting papers or codes are welcome. If you find this repository useful to your research or work, it is really appreciated to star this repository ❤️.

Trending LLM Projects

TinyZero - Clean, minimal, accessible reproduction of DeepSeek R1-Zero
open-r1 - Fully open reproduction of DeepSeek-R1
DeepSeek-R1 - First-generation reasoning models from DeepSeek.
Qwen2.5-Max - Exploring the Intelligence of Large-scale MoE Model.
OpenAI o3-mini - Pushing the frontier of cost-effective reasoning.
DeepSeek-V3 - First open-sourced GPT-4o level model.
Kimi-K2 - MoE language model with 32B active and 1T total parameters.

Milestone Papers

Date	keywords	Institute	Paper
2017-06	Transformers	Google	Attention Is All You Need
2018-06	GPT 1.0	OpenAI	Improving Language Understanding by Generative Pre-Training
2018-10	BERT	Google	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019-02	GPT 2.0	OpenAI	Language Models are Unsupervised Multitask Learners
2019-09	Megatron-LM	NVIDIA	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
2019-10	T5	Google	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
2019-10	ZeRO	Microsoft	ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
2020-01	Scaling Law	OpenAI	Scaling Laws for Neural Language Models
2020-05	GPT 3.0	OpenAI	Language models are few-shot learners
2021-01	Switch Transformers	Google	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2021-08	Codex	OpenAI	Evaluating Large Language Models Trained on Code
2021-08	Foundation Models	Stanford	On the Opportunities and Risks of Foundation Models
2021-09	FLAN	Google	Finetuned Language Models are Zero-Shot Learners
2021-10	T0	HuggingFace et al.	Multitask Prompted Training Enables Zero-Shot Task Generalization
2021-12	GLaM	Google	GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
2021-12	WebGPT	OpenAI	WebGPT: Browser-assisted question-answering with human feedback
2021-12	Retro	DeepMind	Improving language models by retrieving from trillions of tokens
2021-12	Gopher	DeepMind	Scaling Language Models: Methods, Analysis & Insights from Training Gopher
2022-01	COT	Google	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
2022-01	LaMDA	Google	LaMDA: Language Models for Dialog Applications
2022-01	Minerva	Google	Solving Quantitative Reasoning Problems with Language Models
2022-01	Megatron-Turing NLG	Microsoft&NVIDIA	Using Deep and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
2022-03	InstructGPT	OpenAI	Training language models to follow instructions with human feedback
2022-04	PaLM	Google	PaLM: Scaling Language Modeling with Pathways
2022-04	Chinchilla	DeepMind	Training Compute-Optimal Large Language Models
2022-05	OPT	Meta	OPT: Open Pre-trained Transformer Language Models
2022-05	UL2	Google	Unifying Language Learning Paradigms
2022-06	Emergent Abilities	Google	Emergent Abilities of Large Language Models
2022-06	BIG-bench	Google	Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
2022-06	METALM	Microsoft	Language Models are General-Purpose Interfaces
2022-09	Sparrow	DeepMind	Improving alignment of dialogue agents via targeted human judgements
2022-10	Flan-T5/PaLM	Google	Scaling Instruction-Finetuned Language Models
2022-10	GLM-130B	Tsinghua	GLM-130B: An Open Bilingual Pre-trained Model
2022-11	HELM	Stanford	Holistic Evaluation of Language Models
2022-11	BLOOM	BigScience	BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
2022-11	Galactica	Meta	Galactica: A Large Language Model for Science
2022-12	OPT-IML	Meta	OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
2023-01	Flan 2022 Collection	Google	The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
2023-02	LLaMA	Meta	LLaMA: Open and Efficient Foundation Language Models
2023-02	Kosmos-1	Microsoft	Language Is Not All You Need: Aligning Perception with Language Models
2023-03	LRU	DeepMind	Resurrecting Recurrent Neural Networks for Long Sequences
2023-03	PaLM-E	Google	PaLM-E: An Embodied Multimodal Language Model
2023-03	GPT 4	OpenAI	GPT-4 Technical Report
2023-04	LLaVA	UW–Madison&Microsoft	Visual Instruction Tuning
2023-04	Pythia	EleutherAI et al.	Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
2023-05	Dromedary	CMU et al.	Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
2023-05	PaLM 2	Google	PaLM 2 Technical Report
2023-05	RWKV	Bo Peng	RWKV: Reinventing RNNs for the Transformer Era
2023-05	DPO	Stanford	Direct Preference Optimization: Your Language Model is Secretly a Reward Model
2023-05	ToT	Google&Princeton	Tree of Thoughts: Deliberate Problem Solving with Large Language Models
2023-07	LLaMA2	Meta	Llama 2: Open Foundation and Fine-Tuned Chat Models
2023-08	Qwen-VL	Alibaba	Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
2023-10	Mistral 7B	Mistral	Mistral 7B
2023-11	Qwen-Audio	Alibaba	Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
2023-12	Mamba	CMU&Princeton	Mamba: Linear-Time Sequence Modeling with Selective State Spaces
2024-01	DeepSeek-v2	DeepSeek	DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
2024-02	OLMo	Ai2	OLMo: Accelerating the Science of Language Models
2024-05	Mamba2	CMU&Princeton	Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
2024-05	Llama3	Meta	The Llama 3 Herd of Models
2024-06	FineWeb	HuggingFace	The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
2024-07	Qwen2-Audio	Alibaba	Qwen2-Audio Technical Report
2024-09	OLMoE	Ai2	OLMoE: Open Mixture-of-Experts Language Models
2024-09	Qwen2-VL	Alibaba	Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
2024-10	Janus	DeepSeek	Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
2024-11	JanusFlow	DeepSeek	JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
2024-12	Qwen2.5	Alibaba	Qwen2.5 Technical Report
2024-12	DeepSeek-V3	DeepSeek	DeepSeek-V3 Technical Report
2024-12	QVQ	Alibaba	QVQ: To See the World with Wisdom
2024-12	DeepSeek-VL2	DeepSeek	DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
2025-01	DeepSeek-R1	DeepSeek	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
2025-01	Janus-Pro	DeepSeek	Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
2025-02	Qwen2.5-VL	Alibaba	Qwen2.5-VL Technical Report
2025-03	Qwen2.5-Omni	Alibaba	Qwen2.5-Omni Technical Report
2025-03	QwQ	Alibaba	QwQ-32B: Embracing the Power of Reinforcement Learning
2025-05	Qwen3	Alibaba	Qwen3 Technical Report

Open LLM

DeepSeek

Alibaba

LLM for emotion recognition

Model	Supported Modality	Link
A Multi-Modal Model with In-Context Instruction Tuning	Video, Text	GitHub
Videochat: Chat-centric video understanding	Video, Text	GitHub
Mvbench: A comprehensive multi-modal video understanding benchmark	Video, Text	GitHub
Video-llava: Learning united visual representation by alignment before projection	Video, Text	GitHub
Video-llama: An instruction-tuned audio-visual language model for video understanding	Video, Text	GitHub
Video-chatgpt: Towards detailed video understanding via large vision and language models	Video, Text	GitHub
Llama-vid: An image is worth 2 tokens in large language models	Video, Text	GitHub
mplug-owl: Modularization empowers large language models with multimodality	Video, Text	GitHub
Chat-univi: Unified visual representation empowers large language models with image and video understanding	Video, Text	GitHub
Salmonn: Towards generic hearing abilities for large language models	Audio, Text	GitHub
Qwen-audio: Advancing universal audio understanding via unified large-scale audio-language models	Audio, Text	GitHub
Secap: Speech emotion captioning with large language model	Audio, Text	GitHub
Onellm: One framework to align all modalities with language	Audio, Video, Text	GitHub
Pandagpt: One model to instruction-follow them all	Audio, Video, Text	GitHub
Emotion-llama: Multimodal emotion recognition and reasoning with instruction tuning	Audio, Video, Text	GitHub

Datasets

Dataset	Modality	Samples	Description	Emotions	Annotation Manner
RAF-DB	I	29,672	✗	7	Human
AffectNet	I	450,000	✗	8	Human
EmoDB	A	535	✗	7	Human
MSP-Podcast	A	73,042	✗	8	Human
DFEW	V	11,697	✗	7	Human
FERV39k	V	38,935	✗	7	Human
MER2023	A,V,T	5,030	✗	6	Human
MELD	A,V,T	13,708	✗	7	Human
EmoViT	I	51,200	✓	988	Model
MERR-Coarse	A,V,T	28,618	✓	113	Model
MAFW	A,V,T	10,045	✓	399	Human
OV-MERD	A,V,T	332	✓	236	Human-led+Model-assisted
MERR-Fine	A,V,T	4,487	✓	484	Human-led+Model-assisted
MER-Caption	A,V,T	115,595	✓	2,932	Model-led+Human-assisted
MER-Caption+	A,V,T	31,327	✓	1,972	Model-led+Human-assisted

Category	Dataset	Chosen Set	# Samples	Label Description
Fine-grained Emotion	OV-MERD+	All	532	unfixed categories and diverse number of labels per sample
Basic Emotion	MER2023	MER-MULTI	411	most likely label among six candidates
Basic Emotion	MER2024	MER-SEMI	1,169	most likely label among six candidates
Basic Emotion	IEMOCAP	Sessions5	1,241	most likely label among four candidates
Basic Emotion	MELD	Test	2,610	most likely label among seven candidates
Sentiment Analysis	CMU-MOSI	Test	686	sentiment intensity, ranging from [-3, 3]
Sentiment Analysis	CMU-MOSEI	Test	4,659	sentiment intensity, ranging from [-3, 3]
Sentiment Analysis	CH-SIMS	Test	457	sentiment intensity, ranging from [-1, 1]
Sentiment Analysis	CH-SIMS v2	Test	1,034	sentiment intensity, ranging from [-1, 1]

Dataset	Domain	Dur(hrs)	#labels	Modality	Language	Emotion?	Ego?
Large Movie	movie	-	25,000	T	EN	✗	✗
SeMAINE	dialogue	06:30	80	V,A	EN	✓	✗
HUMAINE	diverse	04:11	50	V,A	various	✓	✗
YouTube	diverse	00:29	300	V,A,T	various	✗	✗
SST	movie	-	11,855	T	EN	✗	✗
ICT-MMMO	movie	13:58	340	V,A,T	EN	✗	✗
RECOLA	dialogue	03:50	46	V,A	FR	✓	✓
MOUD	review	00:59	400	V,A,T	ES	✗	✗
AFEW	movie	02:28	1,645	V,A	various	✓	✓
SEWA	adverts	04:39	538	V,A	EN,DE,EL	✓	✗
Disneyworld	disneyland	42:00	15,000	V,A,T	EN	✗	✓
EGTEA Gaze+	diverse	28:00	-	V,A,T	various	✓	✓
BEOID	diverse	-	-	V,A,T	EN	✗	✗
Chorus-Ego	home	34:00	30,000	V,A,T	EN	✗	✓
EPIC	kitchen	100:00	90,000	V,A,T	EN	✗	✓
Ego-4D	diverse	3025:00	74000	V,A,T	various	✗	✓
(E^3)	diverse	71:41	81,248	V,A,T	various	✓	✓

Other surveys

Paper	Url	Source
Mm-llms: Recent advances in multimodal large language models	[paper]	[source]
Efficient multimodal large language models: A survey	[paper]	[source]
Hallucination of multimodal large language models: A survey	[paper]	[source]
A survey on benchmarks of multimodal large language models	[paper]	[source]
A comprehensive survey of large language models and multimodal large language models in medicine	[paper]	-
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning	[paper]	-
How to bridge the gap between modalities: A comprehensive survey on multimodal large language model	[paper]	-
A Comprehensive Overview of Large Language Models	[paper]	-
A review of multi-modal large language and vision models	[paper]	-
Large language models meet nlp: A survey	[paper]	-
Efficient large language models: A survey	[paper]	[source]

📌 Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :

@article{shou2025multimodal,
  title={Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey},
  author={Shou, Yuntao and Meng, Tao and Ai, Wei and Li, Keqin},
  journal={arXiv preprint arXiv:2509.24322},
  year={2025}
}

Acknowledgement ❤️

Thanks to Awesome-LLM.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
resources		resources
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Models Meet Emotion Recognition: A Survey

Trending LLM Projects

Milestone Papers

Open LLM

LLM for emotion recognition

Datasets

Other surveys

📌 Citation

Acknowledgement ❤️

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Large Language Models Meet Emotion Recognition: A Survey

Trending LLM Projects

Milestone Papers

Open LLM

LLM for emotion recognition

Datasets

Other surveys

📌 Citation

Acknowledgement ❤️

Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages