Modeling Hierarchical Uncertainty for Multimodal Emotion Recognition in Conversation

被引：17

作者：

Chen, Feiyu ^{[1
,2
]}

Shao, Jie ^{[1
,2
]}

Zhu, Anjie ^{[1
]}

Ouyang, Deqiang ^{[3
]}

Liu, Xueliang ^{[4
]}

Shen, Heng Tao ^{[1
,5
]}

机构：

[1] Univ Elect Sci & Technol China, Ctr Future Media, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China

[2] Sichuan Artificial Intelligence Res Inst, Yibin 644000, Peoples R China

[3] Chongqing Univ, Coll Comp Sci, Chongqing 400044, Peoples R China

[4] Hefei Univ Technol, Sch Comp & Informat, Hefei 230009, Peoples R China

[5] Peng Cheng Lab, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2024年 / 54卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Uncertainty; Emotion recognition; Predictive models; Context modeling; Reliability; Bayes methods; Adaptation models; Bayesian deep learning; capsule network (CapsNet); conditional layer normalization (CLN); emotion recognition in conversation (ERC); uncertainty;

D O I：

10.1109/TCYB.2022.3185119

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Approximating the uncertainty of an emotional AI agent is crucial for improving the reliability of such agents and facilitating human-in-the-loop solutions, especially in critical scenarios. However, none of the existing systems for emotion recognition in conversation (ERC) has attempted to estimate the uncertainty of their predictions. In this article, we present HU-Dialogue, which models hierarchical uncertainty for the ERC task. We perturb contextual attention weight values with source-adaptive noises within each modality, as a regularization scheme to model context-level uncertainty and adapt the Bayesian deep learning method to the capsule-based prediction layer to model modality-level uncertainty. Furthermore, a weight-sharing triplet structure with conditional layer normalization is introduced to detect both invariance and equivariance among modalities for ERC. We provide a detailed empirical analysis for extensive experiments, which shows that our model outperforms previous state-of-the-art methods on three popular multimodal ERC datasets.

引用

页码：187 / 198

页数：12

共 60 条

[1]

Aguilar G, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P991

[2]

Alaa Ahmed M., 2020, P MACHINE LEARNING R, V119

[3]

Ba JimmyLei., 2016, CORR

[4]

Blundell C, 2015, PR MACH LEARN RES, V37, P1613

[5] IEMOCAP: interactive emotional dyadic motion capture database [J].

Busso, Carlos ;

Bulut, Murtaza ;

Lee, Chi-Chun ;

Kazemzadeh, Abe ;

Mower, Emily ;

Kim, Samuel ;

Chang, Jeannette N. ;

Lee, Sungbok ;

Narayanan, Shrikanth S. .

LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359

[6]

Chauhan DS, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P5647

[7]

Chen Z, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P547

[8]

de Vries H, 2017, ADV NEUR IN, V30

[9] Multiscale Amplitude Feature and Significance of Enhanced Vocal Tract Information for Emotion Classification [J].

Deb, Suman ;

Dandapat, Samarendra .

IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (03) :802-815

[10]

DEGROOT MH, 1983, J ROY STAT SOC D-STA, V32, P12

← 1 2 3 4 5 6 →