Multimodal Embeddings From Language Models for Emotion Recognition in the Wild

被引:10
|
作者
Tseng, Shao-Yen [1 ]
Narayanan, Shrikanth [1 ]
Georgiou, Panayiotis [2 ]
机构
[1] Univ Southern Calif, Dept Elect & Comp Engn, Los Angeles, CA 90089 USA
[2] Apple Inc, Siri Understanding, Culver City, CA 90016 USA
关键词
Acoustics; Task analysis; Feature extraction; Convolution; Emotion recognition; Context modeling; Bit error rate; Machine learning; unsupervised learning; natural language processing; speech processing; emotion recognition; SPEECH;
D O I
10.1109/LSP.2021.3065598
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Word embeddings such as ELMo and BERT have been shown to model word usage in language with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant performance improvement across many natural language processing tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of a parallel stream to the bidirectional language model. This multimodal language model is trained on spoken language data that includes both text and audio modalities. We show that embeddings extracted from this model integrate paralinguistic cues into word meanings and can provide vital affective information by applying these multimodal embeddings to the task of speaker emotion recognition.
引用
收藏
页码:608 / 612
页数:5
相关论文
共 50 条
  • [31] Emotion Recognition From Multimodal Physiological Signals Using a Regularized Deep Fusion of Kernel Machine
    Zhang, Xiaowei
    Liu, Jinyong
    Shen, Jian
    Li, Shaojie
    Hou, Kechen
    Hu, Bin
    Gao, Jin
    Zhang, Tong
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (09) : 4386 - 4399
  • [32] Multimodal Emotion Recognition for Human Robot Interaction
    Adiga, Sharvari
    Vaishnavi, D. V.
    Saxena, Suchitra
    ShikhaTripathi
    2020 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2020), 2020, : 197 - 203
  • [33] Emotion Recognition on Multimodal with Deep Learning and Ensemble
    Dharma, David Adi
    Zahra, Amalia
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 656 - 663
  • [34] Self-Supervised EEG Emotion Recognition Models Based on CNN
    Wang, Xingyi
    Ma, Yuliang
    Cammon, Jared
    Fang, Feng
    Gao, Yunyuan
    Zhang, Yingchun
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 1952 - 1962
  • [35] Learning Alignment for Multimodal Emotion Recognition from Speech
    Xu, Haiyang
    Zhang, Hui
    Han, Kun
    Wang, Yun
    Peng, Yiping
    Li, Xiangang
    INTERSPEECH 2019, 2019, : 3569 - 3573
  • [36] Masked Graph Learning With Recurrent Alignment for Multimodal Emotion Recognition in Conversation
    Meng, Tao
    Zhang, Fuchen
    Shou, Yuntao
    Shao, Hongen
    Ai, Wei
    Li, Keqin
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4298 - 4312
  • [37] Emotion Recognition From Expressions in Face, Voice, and Body: The Multimodal Emotion Recognition Test (MERT)
    Baenziger, Tanja
    Grandjean, Didier
    Scherer, Klaus R.
    EMOTION, 2009, 9 (05) : 691 - 704
  • [38] FMFN: A Fuzzy Multimodal Fusion Network for Emotion Recognition in Ensemble Conducting
    Han, Xiao
    Chen, Fuyang
    Ban, Junrong
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2025, 33 (01) : 168 - 179
  • [39] Multimodal Decoupled Distillation Graph Neural Network for Emotion Recognition in Conversation
    Dai, Yijing
    Li, Yingjian
    Chen, Dongpeng
    Li, Jinxing
    Lu, Guangming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9910 - 9924
  • [40] Emotion Recognition from Multimodal Physiological Signals for Emotion Aware Healthcare Systems
    Ayata, Deger
    Yaslan, Yusuf
    Kamasak, Mustafa E.
    JOURNAL OF MEDICAL AND BIOLOGICAL ENGINEERING, 2020, 40 (02) : 149 - 157