Multimodal Embeddings From Language Models for Emotion Recognition in the Wild

被引:10
|
作者
Tseng, Shao-Yen [1 ]
Narayanan, Shrikanth [1 ]
Georgiou, Panayiotis [2 ]
机构
[1] Univ Southern Calif, Dept Elect & Comp Engn, Los Angeles, CA 90089 USA
[2] Apple Inc, Siri Understanding, Culver City, CA 90016 USA
关键词
Acoustics; Task analysis; Feature extraction; Convolution; Emotion recognition; Context modeling; Bit error rate; Machine learning; unsupervised learning; natural language processing; speech processing; emotion recognition; SPEECH;
D O I
10.1109/LSP.2021.3065598
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Word embeddings such as ELMo and BERT have been shown to model word usage in language with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant performance improvement across many natural language processing tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of a parallel stream to the bidirectional language model. This multimodal language model is trained on spoken language data that includes both text and audio modalities. We show that embeddings extracted from this model integrate paralinguistic cues into word meanings and can provide vital affective information by applying these multimodal embeddings to the task of speaker emotion recognition.
引用
收藏
页码:608 / 612
页数:5
相关论文
共 50 条
  • [21] Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild
    Sun, Bo
    Li, Liandong
    Zhou, Guoyan
    Wu, Xuewen
    He, Jun
    Yu, Lejun
    Li, Dongxue
    Wei, Qinglan
    ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 497 - 502
  • [22] Factors in Emotion Recognition With Deep Learning Models Using Speech and Text on Multiple Corpora
    Braunschweiler, Norbert
    Doddipatla, Rama
    Keizer, Simon
    Stoyanchev, Svetlana
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 722 - 726
  • [23] A Multimodal Emotion Recognition System from Video
    Thushara, S.
    Veni, S.
    PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2016), 2016,
  • [24] Emotion recognition models for companion robots
    Ritvik Nimmagadda
    Kritika Arora
    Miguel Vargas Martin
    The Journal of Supercomputing, 2022, 78 : 13710 - 13727
  • [25] Emotion Recognition Expressed Contextually for Romanian Language
    Marius-Dan, Zbancioc
    Monica, Feraru Silvia
    2018 INTERNATIONAL CONFERENCE AND EXPOSITION ON ELECTRICAL AND POWER ENGINEERING (EPE), 2018, : 179 - 182
  • [26] Refashioning Emotion Recognition Modelling: The Advent of Generalised Large Models
    Zhang, Zixing
    Peng, Liyizhe
    Pang, Tao
    Han, Jing
    Zhao, Huan
    Schuller, Bjorn W.
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 6690 - 6704
  • [27] Refashioning Emotion Recognition Modeling: The Advent of Generalized Large Models
    Zhang, Zixing
    Peng, Liyizhe
    Pang, Tao
    Han, Jing
    Zhao, Huan
    Schuller, Bjorn W.
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (05): : 6690 - 6704
  • [28] Wild Wild Emotion: A Multimodal Ensemble Approach
    Gideon, John
    Zhang, Biqiao
    Aldeneh, Zakaria
    Kim, Yelin
    Khorram, Soheil
    Le, Duc
    Provost, Emily Mower
    ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 501 - 505
  • [29] Temporal Relation Inference Network for Multimodal Speech Emotion Recognition
    Dong, Guan-Nan
    Pun, Chi-Man
    Zhang, Zheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6472 - 6485
  • [30] Multimodal Emotion Recognition Based on the Decoupling of Emotion and Speaker Information
    Gajsek, Rok
    Struc, Vitomir
    Mihelic, France
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 275 - 282