Multimodal Embeddings From Language Models for Emotion Recognition in the Wild

被引：10

作者：

Tseng, Shao-Yen ^{[1
]}

Narayanan, Shrikanth ^{[1
]}

Georgiou, Panayiotis ^{[2
]}

机构：

[1] Univ Southern Calif, Dept Elect & Comp Engn, Los Angeles, CA 90089 USA

[2] Apple Inc, Siri Understanding, Culver City, CA 90016 USA

来源：

IEEE SIGNAL PROCESSING LETTERS | 2021年 / 28卷

关键词：

Acoustics; Task analysis; Feature extraction; Convolution; Emotion recognition; Context modeling; Bit error rate; Machine learning; unsupervised learning; natural language processing; speech processing; emotion recognition; SPEECH;

D O I：

10.1109/LSP.2021.3065598

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Word embeddings such as ELMo and BERT have been shown to model word usage in language with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant performance improvement across many natural language processing tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of a parallel stream to the bidirectional language model. This multimodal language model is trained on spoken language data that includes both text and audio modalities. We show that embeddings extracted from this model integrate paralinguistic cues into word meanings and can provide vital affective information by applying these multimodal embeddings to the task of speaker emotion recognition.

引用

页码：608 / 612

页数：5

共 50 条

[21] Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild
Sun, Bo
Li, Liandong
Zhou, Guoyan
Wu, Xuewen
He, Jun
Yu, Lejun
Li, Dongxue
Wei, Qinglan
ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 497 - 502
[22] Factors in Emotion Recognition With Deep Learning Models Using Speech and Text on Multiple Corpora
Braunschweiler, Norbert
Doddipatla, Rama
Keizer, Simon
Stoyanchev, Svetlana
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 722 - 726
[23] A Multimodal Emotion Recognition System from Video
Thushara, S.
Veni, S.
PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2016), 2016,
[24] Emotion recognition models for companion robots
Ritvik Nimmagadda
Kritika Arora
Miguel Vargas Martin
The Journal of Supercomputing, 2022, 78 : 13710 - 13727
[25] Emotion Recognition Expressed Contextually for Romanian Language
Marius-Dan, Zbancioc
Monica, Feraru Silvia
2018 INTERNATIONAL CONFERENCE AND EXPOSITION ON ELECTRICAL AND POWER ENGINEERING (EPE), 2018, : 179 - 182
[26] Refashioning Emotion Recognition Modelling: The Advent of Generalised Large Models
Zhang, Zixing
Peng, Liyizhe
Pang, Tao
Han, Jing
Zhao, Huan
Schuller, Bjorn W.
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 6690 - 6704
[27] Refashioning Emotion Recognition Modeling: The Advent of Generalized Large Models
Zhang, Zixing
Peng, Liyizhe
Pang, Tao
Han, Jing
Zhao, Huan
Schuller, Bjorn W.
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (05): : 6690 - 6704
[28] Wild Wild Emotion: A Multimodal Ensemble Approach
Gideon, John
Zhang, Biqiao
Aldeneh, Zakaria
Kim, Yelin
Khorram, Soheil
Le, Duc
Provost, Emily Mower
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 501 - 505
[29] Temporal Relation Inference Network for Multimodal Speech Emotion Recognition
Dong, Guan-Nan
Pun, Chi-Man
Zhang, Zheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6472 - 6485
[30] Multimodal Emotion Recognition Based on the Decoupling of Emotion and Speaker Information
Gajsek, Rok
Struc, Vitomir
Mihelic, France
TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 275 - 282

← 1 2 3 4 5 →