Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity

arXiv:2512.10882v4 Announce Type: replace
Abstract: Research increasingly leverages audio-visual materials to analyze emotions in political communication. Multimodal large language models (mLLMs) promise to enable such analyses through in-context learning. However, we lack systematic evidence on whether current mLLMs can reliably measure emotions in real-world political settings. This paper closes this gap by evaluating open- and closed-weights mLLMs available as of early 2026 in video-based emotional arousal measurement using two complementary human-labeled datasets: speech actor recordings created under laboratory conditions and real-world parliamentary debates. I find a critical lab-vs-field performance gap. In videos created under laboratory conditions, the examined mLLMs arousal scores approach human-level reliability. However, in parliamentary debate recordings, all examined models’ arousal scores correlate at best moderately with average human ratings. Moreover, in each dataset, all but one of the examined mLLMs exhibit systematic gender-differential bias, consistently underestimating arousal more for male than for female speakers, resulting in a net-positive intensity bias. These findings reveal important limitations of current mLLMs for real-world political video analysis and establish a rigorous evaluation framework for tracking future developments.

Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity

我们的服务

首页

工作原理

新闻

定价

支持

幫助中心

报告问题

提供反馈

隱私權政策

用户账户

关注我们