Multimodal Decoupled Distillation Graph Neural Network for Emotion Recognition in Conversation

被引：2

作者：

Dai, Yijing ^{[1
]}

Li, Yingjian ^{[2
]}

Chen, Dongpeng ^{[3
]}

Li, Jinxing ^{[1
]}

Lu, Guangming ^{[1
]}

机构：

[1] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518000, Peoples R China

[3] VoiceAI Technol Co Ltd, Shenzhen 518000, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 10期

关键词：

Emotion recognition; Graph neural networks; Context modeling; Circuits and systems; Message passing; Visualization; Task analysis; Emotion recognition in conversation; graph neural network; decoupled representation learning; multimodal fusion;

D O I：

10.1109/TCSVT.2024.3405406

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Graph Neural Networks (GNNs) have attracted increasing attentions for multimodal Emotion Recognition in Conversation (ERC) due to their good performance in contextual understanding. However, most existing GNN-based methods suffer from two challenges: 1) How to explore and propagate appropriate information in a conversational graph. Typical GNNs in ERC neglect to mine the emotion commonality and discrepancy in the local neighborhood, leading to learn similar embbedings for connected nodes. However, the embeddings of these connected nodes are supposed to be distinguishable as they belong to different speakers with different emotions. 2) Most existing works apply simple concatenation or co-occurrence prior for modality combination, failing to fully capture the emotional information of multiple modalities in relationship modeling. In this paper, we propose a multimodal Decoupled Distillation Graph Neural Network (D(2)GNN) to address the above challenges. Specifically, D2GNN decouples the input features into emotion-aware and emotion-agnostic ones on the emotion category-level, aiming to capture emotion commonality and implicit emotion information, respectively. Moreover, we design a new message passing mechanism to separately propagate emotion-aware and -agnostic knowledge between nodes according to speaker dependency in two GNN-based modules, exploring the correlations of utterances and alleviating the similarities of embeddings. Furthermore, a multimodal distillation unit is performed to obtain the distinguishable embeddings by aggregating unimodal decoupled features. Experimental results on two ERC benchmarks demonstrate the superiority of the proposed model. Code is available at https://github.com/gityider/D2GNN.

引用

页码：9910 / 9924

页数：15

共 62 条

[1] Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle [J].

Amjad, Rana Ali ;

Geiger, Bernhard C. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (09) :2225-2239

[2]

ASSOC COMPUTAT LINGUIST, 2022, NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, P247

[3] OpenFace 2.0: Facial Behavior Analysis Toolkit [J].

Baltrusaitis, Tadas ;

Zadeh, Amir ;

Lim, Yao Chong ;

Morency, Louis-Philippe .

PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, :59-66

[4]

Belghazi MI, 2018, PR MACH LEARN RES, V80

[5]

Bo DY, 2021, AAAI CONF ARTIF INTE, V35, P3950

[6] A Unifying Mutual Information View of Metric Learning: Cross-Entropy vs. Pairwise Losses [J].

Boudiaf, Malik ;

Rony, Jerome ;

Ziko, Imtiaz Masud ;

Granger, Eric ;

Pedersoli, Marco ;

Piantanida, Pablo ;

Ben Ayed, Ismail .

COMPUTER VISION - ECCV 2020, PT VI, 2020, 12351 :548-564

[7] IEMOCAP: interactive emotional dyadic motion capture database [J].

Busso, Carlos ;

Bulut, Murtaza ;

Lee, Chi-Chun ;

Kazemzadeh, Abe ;

Mower, Emily ;

Kim, Samuel ;

Chang, Jeannette N. ;

Lee, Sungbok ;

Narayanan, Shrikanth S. .

LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359

[8] Multivariate, Multi-frequency and Multimodal: Rethinking Graph Neural Networks for Emotion Recognition in Conversation [J].

Chen, Feiyu ;

Shao, Jie ;

Zhu, Shuyuan ;

Shen, Heng Tao .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :10761-10770

[9]

Cheng PY, 2020, PR MACH LEARN RES, V119

[10] Residual-Based Graph Convolutional Network for Emotion Recognition in Conversation for Smart Internet of Things [J].

Choi, Young-Ju ;

Lee, Young-Woon ;

Kim, Byung-Gyu .

BIG DATA, 2021, 9 (04) :279-288

← 1 2 3 4 5 6 7 →