Skip to content

yuntaoshou/Awesome-Emotion-Reasoning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 

Repository files navigation

Large Language Models Meet Emotion Recognition: A Survey Awesome

This is the summation of all the methods, datasets, and other survey mentioned in our survey 'Large Language Models Meet Emotion Recognition: A Survey' 🔥. Any problems, please contact shouyuntao@stu.xjtu.edu.cn. Any other interesting papers or codes are welcome. If you find this repository useful to your research or work, it is really appreciated to star this repository ❤️.

GitHub stars GitHub forks

Trending LLM Projects

  • TinyZero - Clean, minimal, accessible reproduction of DeepSeek R1-Zero
  • open-r1 - Fully open reproduction of DeepSeek-R1
  • DeepSeek-R1 - First-generation reasoning models from DeepSeek.
  • Qwen2.5-Max - Exploring the Intelligence of Large-scale MoE Model.
  • OpenAI o3-mini - Pushing the frontier of cost-effective reasoning.
  • DeepSeek-V3 - First open-sourced GPT-4o level model.
  • Kimi-K2 - MoE language model with 32B active and 1T total parameters.

Milestone Papers

Date keywords Institute Paper
2017-06 Transformers Google Attention Is All You Need
2018-06 GPT 1.0 OpenAI Improving Language Understanding by Generative Pre-Training
2018-10 BERT Google BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019-02 GPT 2.0 OpenAI Language Models are Unsupervised Multitask Learners
2019-09 Megatron-LM NVIDIA Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
2019-10 T5 Google Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
2019-10 ZeRO Microsoft ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
2020-01 Scaling Law OpenAI Scaling Laws for Neural Language Models
2020-05 GPT 3.0 OpenAI Language models are few-shot learners
2021-01 Switch Transformers Google Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2021-08 Codex OpenAI Evaluating Large Language Models Trained on Code
2021-08 Foundation Models Stanford On the Opportunities and Risks of Foundation Models
2021-09 FLAN Google Finetuned Language Models are Zero-Shot Learners
2021-10 T0 HuggingFace et al. Multitask Prompted Training Enables Zero-Shot Task Generalization
2021-12 GLaM Google GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
2021-12 WebGPT OpenAI WebGPT: Browser-assisted question-answering with human feedback
2021-12 Retro DeepMind Improving language models by retrieving from trillions of tokens
2021-12 Gopher DeepMind Scaling Language Models: Methods, Analysis & Insights from Training Gopher
2022-01 COT Google Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
2022-01 LaMDA Google LaMDA: Language Models for Dialog Applications
2022-01 Minerva Google Solving Quantitative Reasoning Problems with Language Models
2022-01 Megatron-Turing NLG Microsoft&NVIDIA Using Deep and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
2022-03 InstructGPT OpenAI Training language models to follow instructions with human feedback
2022-04 PaLM Google PaLM: Scaling Language Modeling with Pathways
2022-04 Chinchilla DeepMind Training Compute-Optimal Large Language Models
2022-05 OPT Meta OPT: Open Pre-trained Transformer Language Models
2022-05 UL2 Google Unifying Language Learning Paradigms
2022-06 Emergent Abilities Google Emergent Abilities of Large Language Models
2022-06 BIG-bench Google Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
2022-06 METALM Microsoft Language Models are General-Purpose Interfaces
2022-09 Sparrow DeepMind Improving alignment of dialogue agents via targeted human judgements
2022-10 Flan-T5/PaLM Google Scaling Instruction-Finetuned Language Models
2022-10 GLM-130B Tsinghua GLM-130B: An Open Bilingual Pre-trained Model
2022-11 HELM Stanford Holistic Evaluation of Language Models
2022-11 BLOOM BigScience BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
2022-11 Galactica Meta Galactica: A Large Language Model for Science
2022-12 OPT-IML Meta OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
2023-01 Flan 2022 Collection Google The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
2023-02 LLaMA Meta LLaMA: Open and Efficient Foundation Language Models
2023-02 Kosmos-1 Microsoft Language Is Not All You Need: Aligning Perception with Language Models
2023-03 LRU DeepMind Resurrecting Recurrent Neural Networks for Long Sequences
2023-03 PaLM-E Google PaLM-E: An Embodied Multimodal Language Model
2023-03 GPT 4 OpenAI GPT-4 Technical Report
2023-04 LLaVA UW–Madison&Microsoft Visual Instruction Tuning
2023-04 Pythia EleutherAI et al. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
2023-05 Dromedary CMU et al. Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
2023-05 PaLM 2 Google PaLM 2 Technical Report
2023-05 RWKV Bo Peng RWKV: Reinventing RNNs for the Transformer Era
2023-05 DPO Stanford Direct Preference Optimization: Your Language Model is Secretly a Reward Model
2023-05 ToT Google&Princeton Tree of Thoughts: Deliberate Problem Solving with Large Language Models
2023-07 LLaMA2 Meta Llama 2: Open Foundation and Fine-Tuned Chat Models
2023-08 Qwen-VL Alibaba Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
2023-10 Mistral 7B Mistral Mistral 7B
2023-11 Qwen-Audio Alibaba Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
2023-12 Mamba CMU&Princeton Mamba: Linear-Time Sequence Modeling with Selective State Spaces
2024-01 DeepSeek-v2 DeepSeek DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
2024-02 OLMo Ai2 OLMo: Accelerating the Science of Language Models
2024-05 Mamba2 CMU&Princeton Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
2024-05 Llama3 Meta The Llama 3 Herd of Models
2024-06 FineWeb HuggingFace The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
2024-07 Qwen2-Audio Alibaba Qwen2-Audio Technical Report
2024-09 OLMoE Ai2 OLMoE: Open Mixture-of-Experts Language Models
2024-09 Qwen2-VL Alibaba Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
2024-10 Janus DeepSeek Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
2024-11 JanusFlow DeepSeek JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
2024-12 Qwen2.5 Alibaba Qwen2.5 Technical Report
2024-12 DeepSeek-V3 DeepSeek DeepSeek-V3 Technical Report
2024-12 QVQ Alibaba QVQ: To See the World with Wisdom
2024-12 DeepSeek-VL2 DeepSeek DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
2025-01 DeepSeek-R1 DeepSeek DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
2025-01 Janus-Pro DeepSeek Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
2025-02 Qwen2.5-VL Alibaba Qwen2.5-VL Technical Report
2025-03 Qwen2.5-Omni Alibaba Qwen2.5-Omni Technical Report
2025-03 QwQ Alibaba QwQ-32B: Embracing the Power of Reinforcement Learning
2025-05 Qwen3 Alibaba Qwen3 Technical Report

Open LLM

DeepSeek Alibaba Meta Mistral AI Google Apple Microsoft AllenAI xAI Cohere 01-ai Baichuan Nvidia BLOOM Zhipu AI OpenBMB RWKV Foundation ElutherAI Stability AI BigCode DataBricks Shanghai AI Laboratory

LLM for emotion recognition

Model Supported Modality Link
A Multi-Modal Model with In-Context Instruction Tuning Video, Text GitHub
Videochat: Chat-centric video understanding Video, Text GitHub
Mvbench: A comprehensive multi-modal video understanding benchmark Video, Text GitHub
Video-llava: Learning united visual representation by alignment before projection Video, Text GitHub
Video-llama: An instruction-tuned audio-visual language model for video understanding Video, Text GitHub
Video-chatgpt: Towards detailed video understanding via large vision and language models Video, Text GitHub
Llama-vid: An image is worth 2 tokens in large language models Video, Text GitHub
mplug-owl: Modularization empowers large language models with multimodality Video, Text GitHub
Chat-univi: Unified visual representation empowers large language models with image and video understanding Video, Text GitHub
Salmonn: Towards generic hearing abilities for large language models Audio, Text GitHub
Qwen-audio: Advancing universal audio understanding via unified large-scale audio-language models Audio, Text GitHub
Secap: Speech emotion captioning with large language model Audio, Text GitHub
Onellm: One framework to align all modalities with language Audio, Video, Text GitHub
Pandagpt: One model to instruction-follow them all Audio, Video, Text GitHub
Emotion-llama: Multimodal emotion recognition and reasoning with instruction tuning Audio, Video, Text GitHub

Datasets

Dataset Modality Samples Description Emotions Annotation Manner
RAF-DB I 29,672 7 Human
AffectNet I 450,000 8 Human
EmoDB A 535 7 Human
MSP-Podcast A 73,042 8 Human
DFEW V 11,697 7 Human
FERV39k V 38,935 7 Human
MER2023 A,V,T 5,030 6 Human
MELD A,V,T 13,708 7 Human
EmoViT I 51,200 988 Model
MERR-Coarse A,V,T 28,618 113 Model
MAFW A,V,T 10,045 399 Human
OV-MERD A,V,T 332 236 Human-led+Model-assisted
MERR-Fine A,V,T 4,487 484 Human-led+Model-assisted
MER-Caption A,V,T 115,595 2,932 Model-led+Human-assisted
MER-Caption+ A,V,T 31,327 1,972 Model-led+Human-assisted
Category Dataset Chosen Set # Samples Label Description
Fine-grained Emotion OV-MERD+ All 532 unfixed categories and diverse number of labels per sample
Basic Emotion MER2023 MER-MULTI 411 most likely label among six candidates
Basic Emotion MER2024 MER-SEMI 1,169 most likely label among six candidates
Basic Emotion IEMOCAP Sessions5 1,241 most likely label among four candidates
Basic Emotion MELD Test 2,610 most likely label among seven candidates
Sentiment Analysis CMU-MOSI Test 686 sentiment intensity, ranging from [-3, 3]
Sentiment Analysis CMU-MOSEI Test 4,659 sentiment intensity, ranging from [-3, 3]
Sentiment Analysis CH-SIMS Test 457 sentiment intensity, ranging from [-1, 1]
Sentiment Analysis CH-SIMS v2 Test 1,034 sentiment intensity, ranging from [-1, 1]
Dataset Domain Dur(hrs) #labels Modality Language Emotion? Ego?
Large Movie movie - 25,000 T EN
SeMAINE dialogue 06:30 80 V,A EN
HUMAINE diverse 04:11 50 V,A various
YouTube diverse 00:29 300 V,A,T various
SST movie - 11,855 T EN
ICT-MMMO movie 13:58 340 V,A,T EN
RECOLA dialogue 03:50 46 V,A FR
MOUD review 00:59 400 V,A,T ES
AFEW movie 02:28 1,645 V,A various
SEWA adverts 04:39 538 V,A EN,DE,EL
Disneyworld disneyland 42:00 15,000 V,A,T EN
EGTEA Gaze+ diverse 28:00 - V,A,T various
BEOID diverse - - V,A,T EN
Chorus-Ego home 34:00 30,000 V,A,T EN
EPIC kitchen 100:00 90,000 V,A,T EN
Ego-4D diverse 3025:00 74000 V,A,T various
(E^3) diverse 71:41 81,248 V,A,T various

Other surveys

Paper Url Source
Mm-llms: Recent advances in multimodal large language models [paper] [source]
Efficient multimodal large language models: A survey [paper] [source]
Hallucination of multimodal large language models: A survey [paper] [source]
A survey on benchmarks of multimodal large language models [paper] [source]
A comprehensive survey of large language models and multimodal large language models in medicine [paper] -
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning [paper] -
How to bridge the gap between modalities: A comprehensive survey on multimodal large language model [paper] -
A Comprehensive Overview of Large Language Models [paper] -
A review of multi-modal large language and vision models [paper] -
Large language models meet nlp: A survey [paper] -
Efficient large language models: A survey [paper] [source]

📌 Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :

@article{shou2025multimodal,
  title={Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey},
  author={Shou, Yuntao and Meng, Tao and Ai, Wei and Li, Keqin},
  journal={arXiv preprint arXiv:2509.24322},
  year={2025}
}

Acknowledgement ❤️

Thanks to Awesome-LLM.

Star History

Star History Chart

About

Awesome-Emotion-Reasoning is a collection of Emotion-Reasoning works, including papers, codes and datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors