Dynamic Invariant-Specific Representation Fusion Network for Multimodal Sentiment Analysis

被引:18
作者
He, Jing [1 ]
Yanga, Haonan [1 ]
Zhang, Changfan [1 ]
Chen, Hongrun [1 ]
Xua, Yifu [1 ]
机构
[1] Hunan Univ Technol, Coll Elect & Informat Engn, Zhuzhou 412007, Peoples R China
关键词
ATTENTION;
D O I
10.1155/2022/2105593
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Multimodal sentiment analysis (MSA) aims to infer emotions from linguistic, auditory, and visual sequences. Multimodal information representation method and fusion technology are keys to MSA. However, the problem of difficulty in fully obtaining heterogeneous data interactions in MSA usually exists. To solve these problems, a new framework, namely, dynamic invariant-specific representation fusion network (DISRFN), is put forward in this study. Firstly, in order to effectively utilize redundant information, the joint domain separation representations of all modes are obtained through the improved joint domain separation network. Then, the hierarchical graph fusion net (HGFN) is used for dynamically fusing each representation to obtain the interaction of multimodal data for guidance in the sentiment analysis. Moreover, comparative experiments are performed on popular MSA data sets MOSI and MOSEI, and the research on fusion strategy, loss function ablation, and similarity loss function analysis experiments is designed. The experimental results verify the effectiveness of the DISRFN framework and loss function.
引用
收藏
页数:14
相关论文
共 49 条
[1]   Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos [J].
Agethen, Sebastian ;
Hsu, Winston H. .
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (03) :819-829
[2]  
Aharoni Roee, 2020, P 58 ANN M ASS COMP, P7747, DOI [DOI 10.18653/V1/2020.ACL-MAIN.692, 10.18653/v1/2020.acl-main.692]
[3]   Cross-Modal Scene Networks [J].
Aytar, Yusuf ;
Castrejon, Lluis ;
Vondrick, Carl ;
Pirsiavash, Hamed ;
Torralba, Antonio .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (10) :2303-2314
[4]  
Bousmalis K, 2016, ADV NEUR IN, V29
[5]   Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization [J].
Caicedo, Juan C. ;
BenAbdallah, Jaafar ;
Gonzalez, Fabio A. ;
Nasraoui, Olfa .
NEUROCOMPUTING, 2012, 76 (01) :50-60
[6]  
Chen F., 2020, P 34 AAAI C ART INT
[7]  
Chen MH, 2017, PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2017, P163, DOI 10.1145/3136755.3136801
[8]  
Degottex G, 2014, INT CONF ACOUST SPEE, DOI 10.1109/ICASSP.2014.6853739
[9]   Semantic-enhanced discrete matrix factorization hashing for heterogeneous modal matching [J].
Fang, Yixian ;
Ren, Yuwei ;
Park, Ju H. .
KNOWLEDGE-BASED SYSTEMS, 2020, 192
[10]   Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval [J].
Gu, Wen ;
Gu, Xiaoyan ;
Gu, Jingzi ;
Li, Bo ;
Xiong, Zhi ;
Wang, Weiping .
ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, :159-167