Multi-task learning and mutual information maximization with crossmodal transformer for multimodal sentiment analysis

被引：1

作者：

Shi, Yang ^{[1
]}

Cai, Jinglang ^{[1
]}

Liao, Lei ^{[1
]}

机构：

[1] Sichuan Normal Univ, Coll Phys & Elect Engn, Chengdu 610101, Peoples R China

来源：

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS | 2024年

关键词：

Multimodal sentiment analysis; Multi-Task learning; Mutual information maximization; Crossmodal transformer;

D O I：

10.1007/s10844-024-00858-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The effectiveness of multimodal sentiment analysis hinges on the seamless integration of information from diverse modalities, where the quality of modality fusion directly influences sentiment analysis accuracy. Prior methods often rely on intricate fusion strategies, elevating computational costs and potentially yielding inaccurate multimodal representations due to distribution gaps and information redundancy across heterogeneous modalities. This paper centers on the backpropagation of loss and introduces a Transformer-based model called Multi-Task Learning and Mutual Information Maximization with Crossmodal Transformer (MMMT). Addressing the issue of inaccurate multimodal representation for MSA, MMMT effectively combines mutual information maximization with crossmodal Transformer to convey more modality-invariant information to multimodal representation, fully exploring modal commonalities. Notably, it utilizes multi-modal labels for uni-modal training, presenting a fresh perspective on multi-task learning in MSA. Comparative experiments on the CMU-MOSI and CMU-MOSEI datasets demonstrate that MMMT improves model accuracy while reducing computational burden, making it suitable for resource-constrained and real-time performance-requiring application scenarios. Additionally, ablation experiments validate the efficacy of multi-task learning and probe the specific impact of combining mutual information maximization with Transformer in MSA.

引用

页码：1 / 19

页数：19

共 40 条

[1] Alemi AA, 2019, Arxiv, DOI [arXiv:1612.00410, DOI 10.48550/ARXIV.1612.00410]
[2] Akhtar MS, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P370
[3] Detecting COVID-19 vaccine hesitancy in India: a multimodal transformer based approach
Borah, Anindita
[J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2023, 60 (01) : 157 - 173
[4] Chen F., 2019, arXiv
[5] Degottex G, 2014, INT CONF ACOUST SPEE, DOI 10.1109/ICASSP.2014.6853739
[6] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7] Ekman P, 1997, WHAT FACE REVEALS BA
[8] GraphDPI: Partial label disambiguation by graph representation learning via mutual information maximization
Fan, Jinfu
Yu, Yang
Huang, Linqing
Wang, Zhongjie
[J]. PATTERN RECOGNITION, 2023, 134
[9] Optimal binning for a variance based alternative of mutual information in pattern recognition
Fazekas, Attila
Kovacs, Gyorgy
[J]. NEUROCOMPUTING, 2023, 519 : 135 - 147
[10] What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis
Gkoumas, Dimitris
Li, Qiuchi
Lioma, Christina
Yu, Yijun
Song, Dawei
[J]. INFORMATION FUSION, 2021, 66 : 184 - 197

← 1 2 3 4 →