Multimodal Embeddings From Language Models for Emotion Recognition in the Wild

被引：10

作者：

Tseng, Shao-Yen ^{[1
]}

Narayanan, Shrikanth ^{[1
]}

Georgiou, Panayiotis ^{[2
]}

机构：

[1] Univ Southern Calif, Dept Elect & Comp Engn, Los Angeles, CA 90089 USA

[2] Apple Inc, Siri Understanding, Culver City, CA 90016 USA

来源：

IEEE SIGNAL PROCESSING LETTERS | 2021年 / 28卷

关键词：

Acoustics; Task analysis; Feature extraction; Convolution; Emotion recognition; Context modeling; Bit error rate; Machine learning; unsupervised learning; natural language processing; speech processing; emotion recognition; SPEECH;

D O I：

10.1109/LSP.2021.3065598

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Word embeddings such as ELMo and BERT have been shown to model word usage in language with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant performance improvement across many natural language processing tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of a parallel stream to the bidirectional language model. This multimodal language model is trained on spoken language data that includes both text and audio modalities. We show that embeddings extracted from this model integrate paralinguistic cues into word meanings and can provide vital affective information by applying these multimodal embeddings to the task of speaker emotion recognition.

引用

页码：608 / 612

页数：5

共 50 条

[11] Multiple Models Fusion for Emotion Recognition in the Wild
Wu, Jianlong
Lin, Zhouchen
Zha, Hongbin
ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 475 - 481
[12] Comparing Recognition Performance and Robustness of Multimodal Deep Learning Models for Multimodal Emotion Recognition
Liu, Wei
Qiu, Jie-Lin
Zheng, Wei-Long
Lu, Bao-Liang
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 715 - 729
[13] Emotion recognition from unimodal to multimodal analysis: A review
Ezzameli, K.
Mahersia, H.
INFORMATION FUSION, 2023, 99
[14] Multimodal Interfaces for Emotion Recognition: Models, Challenges and Opportunities
Greco, Danilo
Barra, Paola
D'Errico, Lorenzo
Staffa, Mariacarla
ARTIFICIAL INTELLIGENCE IN HCI, PT II, AI-HCI 2024, 2024, 14735 : 152 - 162
[15] Audiovisual emotion recognition in wild
Avots, Egils
Sapinski, Tomasz
Bachmann, Maie
Kaminska, Dorota
MACHINE VISION AND APPLICATIONS, 2019, 30 (05) : 975 - 985
[16] Multimodal Speech Emotion Recognition Based on Large Language Model
Fang, Congcong
Jin, Yun
Chen, Guanlin
Zhang, Yunfan
Li, Shidang
Ma, Yong
Xie, Yue
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (11) : 1463 - 1467
[17] Audio-Visual Learning for Multimodal Emotion Recognition
Fan, Siyu
Jing, Jianan
Wang, Chongwen
SYMMETRY-BASEL, 2025, 17 (03):
[18] Multimodal Emotion Recognition Fusion Analysis Adapting BERT With Heterogeneous Feature Unification
Lee, Sanghyun
Han, David K.
Ko, Hanseok
IEEE ACCESS, 2021, 9 : 94557 - 94572
[19] Multimodal Fusion of Spatial-Temporal Features for Emotion Recognition in the Wild
Wang, Zuchen
Fang, Yuchun
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 205 - 214
[20] Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion
Siriwardhana, Shamane
Kaluarachchi, Tharindu
Billinghurst, Mark
Nanayakkara, Suranga
IEEE ACCESS, 2020, 8 (08): : 176274 - 176285

← 1 2 3 4 5 →