Multimodal Sentiment Analysis With Two-Phase Multi-Task Learning

被引：45

作者：

Yang, Bo ^{[1
]}

Wu, Lijun ^{[2
]}

Zhu, Jinhua ^{[4
]}

Shao, Bo ^{[3
]}

Lin, Xiaola

Liu, Tie-Yan ^{[2
]}

机构：

[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510275, Peoples R China

[2] Microsoft Res Asia, Beijing 100190, Peoples R China

[3] Microsoft STCA, Beijing 100190, Peoples R China

[4] Univ Sci & Technol China, EEIS Dept, CAS Key Lab GIPAS, Hefei 230052, Anhui, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2022年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Bit error rate; Task analysis; Multitasking; Visualization; Sentiment analysis; Training; Transformers; BERT; Multimodal sentiment analysis; multi-task; REPRESENTATIONS; NETWORK;

D O I：

10.1109/TASLP.2022.3178204

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Multimodal Sentiment Analysis (MSA) is a challenging research area that studies sentiment expressed from multiple heterogeneous modalities. Given those pre-trained language models such as BERT have shown state-of-the-art (SOTA) performance in multiple NLP disciplines, existing models tend to integrate these modalities into BERT and treat the MSA as a single prediction task. However, we find that simply fusing the multimodal features into BERT cannot well establish the power of a strong pre-trained model. Besides, the classification ability of each modality is also suppressed by single-task learning. In this paper, we proposes a multimodal framework named Two-Phase Multi-task Sentiment Analysis (TPMSA). It applies a two-phase training strategy to make the most of the pre-trained model and a novel multi-task learning strategy to investigate the classification ability of each representation. We conducted experiments on two multimodal benchmark datasets, CMU-MOSI and CMU-MOSEI. The results show that our TPMSA model outperforms the current SOTA method on both datasets across most of the metrics, clearly showing our proposed method's effectiveness.

引用

页码：2015 / 2024

页数：10

共 45 条

[1] Normalized amplitude quotient for parametrization of the glottal flow [J].

Alku, P ;

Bäckström, T ;

Vilkman, E .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 112 (02) :701-710

[2]

Andrienko G., 2013, Introduction, P1

[3]

Ba JimmyLei., 2016, CORR

[4]

Colombo P, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P231

[5]

Degottex G, 2014, INT CONF ACOUST SPEE, DOI 10.1109/ICASSP.2014.6853739

[6]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[7]

Drugman T, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P1984

[8]

Graves A, 2012, STUD COMPUT INTELL, V385, P37

[9]

Pham H, 2018, FIRST GRAND CHALLENGE AND WORKSHOP ON HUMAN MULTIMODAL LANGUAGE (CHALLENGE-HML), P53

[10]

Han W, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P9180

← 1 2 3 4 5 →