Multimodal Embeddings From Language Models for Emotion Recognition in the Wild

被引:10
|
作者
Tseng, Shao-Yen [1 ]
Narayanan, Shrikanth [1 ]
Georgiou, Panayiotis [2 ]
机构
[1] Univ Southern Calif, Dept Elect & Comp Engn, Los Angeles, CA 90089 USA
[2] Apple Inc, Siri Understanding, Culver City, CA 90016 USA
关键词
Acoustics; Task analysis; Feature extraction; Convolution; Emotion recognition; Context modeling; Bit error rate; Machine learning; unsupervised learning; natural language processing; speech processing; emotion recognition; SPEECH;
D O I
10.1109/LSP.2021.3065598
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Word embeddings such as ELMo and BERT have been shown to model word usage in language with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant performance improvement across many natural language processing tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of a parallel stream to the bidirectional language model. This multimodal language model is trained on spoken language data that includes both text and audio modalities. We show that embeddings extracted from this model integrate paralinguistic cues into word meanings and can provide vital affective information by applying these multimodal embeddings to the task of speaker emotion recognition.
引用
收藏
页码:608 / 612
页数:5
相关论文
共 50 条
  • [11] Multiple Models Fusion for Emotion Recognition in the Wild
    Wu, Jianlong
    Lin, Zhouchen
    Zha, Hongbin
    ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 475 - 481
  • [12] Comparing Recognition Performance and Robustness of Multimodal Deep Learning Models for Multimodal Emotion Recognition
    Liu, Wei
    Qiu, Jie-Lin
    Zheng, Wei-Long
    Lu, Bao-Liang
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 715 - 729
  • [13] Emotion recognition from unimodal to multimodal analysis: A review
    Ezzameli, K.
    Mahersia, H.
    INFORMATION FUSION, 2023, 99
  • [14] Multimodal Interfaces for Emotion Recognition: Models, Challenges and Opportunities
    Greco, Danilo
    Barra, Paola
    D'Errico, Lorenzo
    Staffa, Mariacarla
    ARTIFICIAL INTELLIGENCE IN HCI, PT II, AI-HCI 2024, 2024, 14735 : 152 - 162
  • [15] Audiovisual emotion recognition in wild
    Avots, Egils
    Sapinski, Tomasz
    Bachmann, Maie
    Kaminska, Dorota
    MACHINE VISION AND APPLICATIONS, 2019, 30 (05) : 975 - 985
  • [16] Multimodal Speech Emotion Recognition Based on Large Language Model
    Fang, Congcong
    Jin, Yun
    Chen, Guanlin
    Zhang, Yunfan
    Li, Shidang
    Ma, Yong
    Xie, Yue
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (11) : 1463 - 1467
  • [17] Audio-Visual Learning for Multimodal Emotion Recognition
    Fan, Siyu
    Jing, Jianan
    Wang, Chongwen
    SYMMETRY-BASEL, 2025, 17 (03):
  • [18] Multimodal Emotion Recognition Fusion Analysis Adapting BERT With Heterogeneous Feature Unification
    Lee, Sanghyun
    Han, David K.
    Ko, Hanseok
    IEEE ACCESS, 2021, 9 : 94557 - 94572
  • [19] Multimodal Fusion of Spatial-Temporal Features for Emotion Recognition in the Wild
    Wang, Zuchen
    Fang, Yuchun
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 205 - 214
  • [20] Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion
    Siriwardhana, Shamane
    Kaluarachchi, Tharindu
    Billinghurst, Mark
    Nanayakkara, Suranga
    IEEE ACCESS, 2020, 8 (08): : 176274 - 176285