Multimodal Embeddings From Language Models for Emotion Recognition in the Wild

被引：10

作者：

Tseng, Shao-Yen ^{[1
]}

Narayanan, Shrikanth ^{[1
]}

Georgiou, Panayiotis ^{[2
]}

机构：

[1] Univ Southern Calif, Dept Elect & Comp Engn, Los Angeles, CA 90089 USA

[2] Apple Inc, Siri Understanding, Culver City, CA 90016 USA

来源：

IEEE SIGNAL PROCESSING LETTERS | 2021年 / 28卷

关键词：

Acoustics; Task analysis; Feature extraction; Convolution; Emotion recognition; Context modeling; Bit error rate; Machine learning; unsupervised learning; natural language processing; speech processing; emotion recognition; SPEECH;

D O I：

10.1109/LSP.2021.3065598

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Word embeddings such as ELMo and BERT have been shown to model word usage in language with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant performance improvement across many natural language processing tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of a parallel stream to the bidirectional language model. This multimodal language model is trained on spoken language data that includes both text and audio modalities. We show that embeddings extracted from this model integrate paralinguistic cues into word meanings and can provide vital affective information by applying these multimodal embeddings to the task of speaker emotion recognition.

引用

页码：608 / 612

页数：5

共 50 条

[41] Can We Exploit All Datasets? Multimodal Emotion Recognition Using Cross-Modal Translation
Yoon, Yeo Chan
IEEE ACCESS, 2022, 10 : 64516 - 64524
[42] Emotion Recognition from Multimodal Physiological Signals for Emotion Aware Healthcare Systems
Değer Ayata
Yusuf Yaslan
Mustafa E. Kamasak
Journal of Medical and Biological Engineering, 2020, 40 : 149 - 157
[43] Emotion Recognition in the Wild from Videos using Images
Bargal, Sarah Adel
Barsoum, Emad
Ferrer, Cristian Canton
Zhang, Cha
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 433 - 436
[44] Modeling Hierarchical Uncertainty for Multimodal Emotion Recognition in Conversation
Chen, Feiyu
Shao, Jie
Zhu, Anjie
Ouyang, Deqiang
Liu, Xueliang
Shen, Heng Tao
IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (01) : 187 - 198
[45] EMERSK -Explainable Multimodal Emotion Recognition With Situational Knowledge
Palash, Mijanur
Bhargava, Bharat
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2785 - 2794
[46] End-to-End Learning for Multimodal Emotion Recognition in Video With Adaptive Loss
Huynh, Van Thong
Yang, Hyung-Jeong
Lee, Guee-Sang
Kim, Soo-Hyung
IEEE MULTIMEDIA, 2021, 28 (02) : 59 - 66
[47] Emotion Recognition With Multimodal Transformer Fusion Framework Based on Acoustic and Lexical Information
Guo, Lili
Wang, Longbiao
Dang, Jianwu
Fu, Yahui
Liu, Jiaxing
Ding, Shifei
IEEE MULTIMEDIA, 2022, 29 (02) : 94 - 103
[48] Audiovisual emotion recognition in wild
Egils Avots
Tomasz Sapiński
Maie Bachmann
Dorota Kamińska
Machine Vision and Applications, 2019, 30 : 975 - 985
[49] An Emotion-Space Model of Multimodal Emotion Recognition
Choe, Kyung-Il
ADVANCED SCIENCE LETTERS, 2018, 24 (01) : 699 - 702
[50] Deep Auto-Encoders With Sequential Learning for Multimodal Dimensional Emotion Recognition
Nguyen, Dung
Nguyen, Duc Thanh
Zeng, Rui
Nguyen, Thanh Thi
Tran, Son N.
Nguyen, Thin
Sridharan, Sridha
Fookes, Clinton
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1313 - 1324

← 1 2 3 4 5 →