Multimodal Embeddings From Language Models for Emotion Recognition in the Wild

被引:10
|
作者
Tseng, Shao-Yen [1 ]
Narayanan, Shrikanth [1 ]
Georgiou, Panayiotis [2 ]
机构
[1] Univ Southern Calif, Dept Elect & Comp Engn, Los Angeles, CA 90089 USA
[2] Apple Inc, Siri Understanding, Culver City, CA 90016 USA
关键词
Acoustics; Task analysis; Feature extraction; Convolution; Emotion recognition; Context modeling; Bit error rate; Machine learning; unsupervised learning; natural language processing; speech processing; emotion recognition; SPEECH;
D O I
10.1109/LSP.2021.3065598
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Word embeddings such as ELMo and BERT have been shown to model word usage in language with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant performance improvement across many natural language processing tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of a parallel stream to the bidirectional language model. This multimodal language model is trained on spoken language data that includes both text and audio modalities. We show that embeddings extracted from this model integrate paralinguistic cues into word meanings and can provide vital affective information by applying these multimodal embeddings to the task of speaker emotion recognition.
引用
收藏
页码:608 / 612
页数:5
相关论文
共 50 条
  • [42] Emotion Recognition from Multimodal Physiological Signals for Emotion Aware Healthcare Systems
    Değer Ayata
    Yusuf Yaslan
    Mustafa E. Kamasak
    Journal of Medical and Biological Engineering, 2020, 40 : 149 - 157
  • [43] Emotion Recognition in the Wild from Videos using Images
    Bargal, Sarah Adel
    Barsoum, Emad
    Ferrer, Cristian Canton
    Zhang, Cha
    ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 433 - 436
  • [44] Modeling Hierarchical Uncertainty for Multimodal Emotion Recognition in Conversation
    Chen, Feiyu
    Shao, Jie
    Zhu, Anjie
    Ouyang, Deqiang
    Liu, Xueliang
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (01) : 187 - 198
  • [45] EMERSK -Explainable Multimodal Emotion Recognition With Situational Knowledge
    Palash, Mijanur
    Bhargava, Bharat
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2785 - 2794
  • [46] End-to-End Learning for Multimodal Emotion Recognition in Video With Adaptive Loss
    Huynh, Van Thong
    Yang, Hyung-Jeong
    Lee, Guee-Sang
    Kim, Soo-Hyung
    IEEE MULTIMEDIA, 2021, 28 (02) : 59 - 66
  • [47] Emotion Recognition With Multimodal Transformer Fusion Framework Based on Acoustic and Lexical Information
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Fu, Yahui
    Liu, Jiaxing
    Ding, Shifei
    IEEE MULTIMEDIA, 2022, 29 (02) : 94 - 103
  • [48] Audiovisual emotion recognition in wild
    Egils Avots
    Tomasz Sapiński
    Maie Bachmann
    Dorota Kamińska
    Machine Vision and Applications, 2019, 30 : 975 - 985
  • [49] An Emotion-Space Model of Multimodal Emotion Recognition
    Choe, Kyung-Il
    ADVANCED SCIENCE LETTERS, 2018, 24 (01) : 699 - 702
  • [50] Deep Auto-Encoders With Sequential Learning for Multimodal Dimensional Emotion Recognition
    Nguyen, Dung
    Nguyen, Duc Thanh
    Zeng, Rui
    Nguyen, Thanh Thi
    Tran, Son N.
    Nguyen, Thin
    Sridharan, Sridha
    Fookes, Clinton
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1313 - 1324