CCMA: CapsNet for audio-video sentiment analysis using cross-modal attention

被引：2

作者：

Li, Haibin ^{[1
]}

Guo, Aodi ^{[1
]}

Li, Yaqian ^{[1
]}

机构：

[1] Yanshan Univ, Key Lab Ind Comp Control Engn Hebei Prov, Qinhuangdao 066004, Peoples R China

来源：

VISUAL COMPUTER | 2025年 / 41卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Sentiment analysis; Audio-video bimodal; Positional embedding; Capsule network; Cross-modal fusion; FUSION;

D O I：

10.1007/s00371-024-03453-9

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Multimodal sentiment analysis is a challenging research area that aims to investigate the use of complementary multimodal information to analyze the sentiment tendencies of a character. In order to effectively fuse multimodal heterogeneous data from different information sources, the current advanced models have developed a variety of fusion strategies mainly based on text modality, and research in the field of audio-visual bimodal fusion is relatively scarce. Therefore, in this paper, we propose a framework for sentiment analysis based on audio and video bimodality, CCMA. Initially, we preprocess the raw data and retain modality-specific temporal information through positional embedding. On the one hand, in order to solve the issue of modal contribution unbalance, we use capsule network and 1D convolution at the video modality side and audio modality side, respectively, to better represent the modal features. On the other hand, we believe that inter-modal explicit interaction is the best way to fuse cross-modal information, and design a cross-modal attentional interaction module for explicit interaction of modal information to enhance the fusion quality. Experiments on two popular sentiment analysis datasets RAVDESS and CMU-MOSEI show that the accuracy of our model performs better that the competing methods, which illustrates the effectiveness of our method.

引用

页码：1609 / 1620

页数：12

共 49 条

[1] Feature-level and Model-level Audiovisual Fusion for Emotion Recognition in the Wild [J].

Cai, Jie ;

Meng, Zibo ;

Khan, Ahmed Shehab ;

Li, Zhiyuan ;

O'Reilly, James ;

Han, Shizhong ;

Liu, Ping ;

Chen, Min ;

Tong, Yan .

2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, :443-448

[2] Self-attention fusion for audiovisual emotion recognition with incomplete data [J].

Chumachenko, Kateryna ;

Iosifidis, Alexandros ;

Gabbouj, Moncef .

2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, :2822-2828

[3] An Approach to Integrating Sentiment Analysis into Recommender Systems [J].

Dang, Cach N. ;

Moreno-Garcia, Maria N. ;

Prieta, Fernando De la .

SENSORS, 2021, 21 (16)

[4] Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions [J].

Gandhi, Ankita ;

Adhvaryu, Kinjal ;

Poria, Soujanya ;

Cambria, Erik ;

Hussain, Amir .

INFORMATION FUSION, 2023, 91 :424-444

[5]

Ghosal D, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3454

[6] Targeted Aspect-Based Multimodal Sentiment Analysis: An Attention Capsule Extraction and Multi-Head Fusion Network [J].

Gu, Donghong ;

Wang, Jiaqian ;

Cai, Shaohua ;

Yang, Chi ;

Song, Zhengxin ;

Zhao, Haoliang ;

Xiao, Luwei ;

Wang, Hua .

IEEE ACCESS, 2021, 9 :157329-157336

[7] Audio-Visual Fusion Network Based on Conformer for Multimodal Emotion Recognition [J].

Guo, Peini ;

Chen, Zhengyan ;

Li, Yidi ;

Liu, Hong .

ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 :315-326

[8] Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis [J].

Han, Wei ;

Chen, Hui ;

Gelbukh, Alexander ;

Zadeh, Amir ;

Morency, Louis-philippe ;

Poria, Soujanya .

PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2021, 2021, :6-15

[9] MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis [J].

Hazarika, Devamanyu ;

Zimmermann, Roger ;

Poria, Soujanya .

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :1122-1131

[10] Automatic ECG-Based Emotion Recognition in Music Listening [J].

Hsu, Yu-Liang ;

Wang, Jeen-Shing ;

Chiang, Wei-Chun ;

Hung, Chien-Han .

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2020, 11 (01) :85-99

← 1 2 3 4 5 →