Interpretable Multimodal Capsule Fusion

被引：8

作者：

Wu, Jianfeng ^{[1
]}

Mai, Sijie ^{[1
]}

Hu, Haifeng ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510275, Guangdong, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2022年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Routing; Transformers; Adaptation models; Visualization; Brain modeling; Feature extraction; Data models; Capsule network; interpretation; LSTM; modality fusion; multimodal sentiment analysis; SENTIMENT ANALYSIS; NETWORK; ATTENTION;

D O I：

10.1109/TASLP.2022.3178236

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

With the development of social networking platform, multimodal sentiment analysis has become increasingly prominent. Existing models focus on capturing intramodal and intermodal interactions to produce effective modality representations. However, they overlook the study of interpretability which reveals how modalities interact with each other and which modality contributes most to the final prediction. In this paper, we propose an interpretable model called Interpretable Multimodal Capsule Fusion (IMCF) which integrates routing mechanism of Capsule Network (CapsNet) and Long Short-Term Memory (LSTM) to produce refined modality representations and provide interpretation. By constructing features of different modalities into input sequence, we are able to obtain highly expressive representation of intermodal dynamics due to the strong ability of LSTM to produce representation of sequence. As routing mechanism is applied during modality fusion and prediction stages, the value of routing coefficient can reveal the contributions of different modalities or dynamics, which provides interpretation. Meanwhile, routing mechanism can iteratively adjust the information flows of different modalities, which makes the process of modality fusion more reasonable. The experimental results show that our model achieves competitive performance on two benchmark datasets with effective modality fusion by LSTM and interpretation provided by routing mechanism.

引用

页码：1815 / 1826

页数：12

共 53 条

[1]

[Anonymous], INT C LEARNING REPRE

[2] Multimodal Machine Learning: A Survey and Taxonomy [J].

Baltrusaitis, Tadas ;

Ahuja, Chaitanya ;

Morency, Louis-Philippe .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443

[3] IEMOCAP: interactive emotional dyadic motion capture database [J].

Busso, Carlos ;

Bulut, Murtaza ;

Lee, Chi-Chun ;

Kazemzadeh, Abe ;

Mower, Emily ;

Kim, Samuel ;

Chang, Jeannette N. ;

Lee, Sungbok ;

Narayanan, Shrikanth S. .

LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359

[4]

Cambria E., 2010, P 2010 AAAI FALL S S, P14

[5] SenticNet 6: Ensemble Application of Symbolic and Subsymbolic AI for Sentiment Analysis [J].

Cambria, Erik ;

Li, Yang ;

Xing, Frank Z. ;

Poria, Soujanya ;

Kwok, Kenneth .

CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, :105-114

[6] Affective Computing and Sentiment Analysis [J].

Cambria, Erik .

IEEE INTELLIGENT SYSTEMS, 2016, 31 (02) :102-107

[7]

Cambria E, 2013, PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE FOR HUMAN-LIKE INTELLIGENCE (CIHLI), P108, DOI 10.1109/CIHLI.2013.6613272

[8] Fuzzy commonsense reasoning for multimodal sentiment analysis [J].

Chaturvedi, Iti ;

Satapathy, Ranjan ;

Cavallari, Sandro ;

Cambria, Erik .

PATTERN RECOGNITION LETTERS, 2019, 125 :264-270

[9]

Chen Z, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P547

[10]

Degottex G, 2014, INT CONF ACOUST SPEE, DOI 10.1109/ICASSP.2014.6853739

← 1 2 3 4 5 6 →