Multimodal Consistency-Based Teacher for Semi-Supervised Multimodal Sentiment Analysis

被引：2

作者：

Yuan, Ziqi ^{[1
]}

Fang, Jingliang ^{[1
,2
]}

Xu, Hua ^{[1
,2
]}

Gao, Kai ^{[3
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China

[2] Samton Jiangxi Technol Dev Co Ltd, Nanchang 330036, Peoples R China

[3] Hebei Univ Sci & Technol, Sch Informat Sci & Engn, Shijiazhuang 050018, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Task analysis; Sentiment analysis; Visualization; Training; Speech processing; Semisupervised learning; Image classification; Consistency-based semi-supervised learning; multimodal sentiment analysis; pseudo-label filtering;

D O I：

10.1109/TASLP.2024.3430543

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Multimodal sentiment analysis holds significant importance within the realm of human-computer interaction. Due to the ease of collecting unlabeled online resources compared to the high costs associated with annotation, it becomes imperative for researchers to develop semi-supervised methods that leverage unlabeled data to enhance model performance. Existing semi-supervised approaches, particularly those applied to trivial image classification tasks, are not suitable for multimodal regression tasks due to their reliance on task-specific augmentation and thresholds designed for classification tasks. To address this limitation, we propose the Multimodal Consistency-based Teacher (MC-Teacher), which incorporates consistency-based pseudo-label technique into semi-supervised multimodal sentiment analysis. In our approach, we first propose synergistic consistency assumption which focus on the consistency among bimodal representation. Building upon this assumption, we develop a learnable filter network that autonomously learns how to identify misleading instances instead of threshold-based methods. This is achieved by leveraging both the implicit discriminant consistency on unlabeled instances and the explicit guidance on constructed training data with labeled instances. Additionally, we design the self-adaptive exponential moving average strategy to decouple the student and teacher networks, utilizing a heuristic momentum coefficient. Through both quantitative and qualitative experiments on two benchmark datasets, we demonstrate the outstanding performances of the proposed MC-Teacher approach. Furthermore, detailed analysis experiments and case studies are provided for each crucial component to intuitively elucidate the inner mechanism and further validate their effectiveness.

引用

页码：3669 / 3683

页数：15

共 58 条

[1] Multimodal Machine Learning: A Survey and Taxonomy [J].

Baltrusaitis, Tadas ;

Ahuja, Chaitanya ;

Morency, Louis-Philippe .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443

[2]

Berthelot D, 2019, ADV NEUR IN, V32

[3] Inter-Intra Modal Representation Augmentation With Trimodal Collaborative Disentanglement Network for Multimodal Sentiment Analysis [J].

Chen, Chen ;

Hong, Hansheng ;

Guo, Jie ;

Song, Bin .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 :1476-1488

[4] Semi-Supervised Multimodal Emotion Recognition with Class-Balanced Pseudo-Labeling [J].

Chen, Haifeng ;

Guo, Chujia ;

Li, Yan ;

Zhang, Peng ;

Jiang, Dongmei .

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :9556-9560

[5] The Weighted Cross-Modal Attention Mechanism With Sentiment Prediction Auxiliary Task for Multimodal Sentiment Analysis [J].

Chen, Qiupu ;

Huang, Guimin ;

Wang, Yabing .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 :2689-2695

[6]

Chen T, 2020, PR MACH LEARN RES, V119

[7] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision [J].

Chen, Xiaokang ;

Yuan, Yuhui ;

Zeng, Gang ;

Wang, Jingdong .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :2613-2622

[8] Semi-Supervised Multimodal Emotion Recognition with Expression MAE [J].

Cheng, Zebang ;

Lin, Yuxiang ;

Chen, Zhaoru ;

Li, Xiang ;

Mao, Shuyi ;

Zhang, Fan ;

Ding, Daijun ;

Zhang, Bowen ;

Peng, Xiaojiang .

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :9436-9440

[9]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[10]

Gal Y, 2016, PR MACH LEARN RES, V48

← 1 2 3 4 5 6 →