Multimodal Sentiment Analysis With Two-Phase Multi-Task Learning

被引:45
作者
Yang, Bo [1 ]
Wu, Lijun [2 ]
Zhu, Jinhua [4 ]
Shao, Bo [3 ]
Lin, Xiaola
Liu, Tie-Yan [2 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510275, Peoples R China
[2] Microsoft Res Asia, Beijing 100190, Peoples R China
[3] Microsoft STCA, Beijing 100190, Peoples R China
[4] Univ Sci & Technol China, EEIS Dept, CAS Key Lab GIPAS, Hefei 230052, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Bit error rate; Task analysis; Multitasking; Visualization; Sentiment analysis; Training; Transformers; BERT; Multimodal sentiment analysis; multi-task; REPRESENTATIONS; NETWORK;
D O I
10.1109/TASLP.2022.3178204
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multimodal Sentiment Analysis (MSA) is a challenging research area that studies sentiment expressed from multiple heterogeneous modalities. Given those pre-trained language models such as BERT have shown state-of-the-art (SOTA) performance in multiple NLP disciplines, existing models tend to integrate these modalities into BERT and treat the MSA as a single prediction task. However, we find that simply fusing the multimodal features into BERT cannot well establish the power of a strong pre-trained model. Besides, the classification ability of each modality is also suppressed by single-task learning. In this paper, we proposes a multimodal framework named Two-Phase Multi-task Sentiment Analysis (TPMSA). It applies a two-phase training strategy to make the most of the pre-trained model and a novel multi-task learning strategy to investigate the classification ability of each representation. We conducted experiments on two multimodal benchmark datasets, CMU-MOSI and CMU-MOSEI. The results show that our TPMSA model outperforms the current SOTA method on both datasets across most of the metrics, clearly showing our proposed method's effectiveness.
引用
收藏
页码:2015 / 2024
页数:10
相关论文
共 45 条
[1]   Normalized amplitude quotient for parametrization of the glottal flow [J].
Alku, P ;
Bäckström, T ;
Vilkman, E .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 112 (02) :701-710
[2]  
Andrienko G., 2013, Introduction, P1
[3]  
Ba JimmyLei., 2016, CORR
[4]  
Colombo P, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P231
[5]  
Degottex G, 2014, INT CONF ACOUST SPEE, DOI 10.1109/ICASSP.2014.6853739
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]  
Drugman T, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P1984
[8]  
Graves A, 2012, STUD COMPUT INTELL, V385, P37
[9]  
Pham H, 2018, FIRST GRAND CHALLENGE AND WORKSHOP ON HUMAN MULTIMODAL LANGUAGE (CHALLENGE-HML), P53
[10]  
Han W, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P9180