Multi-Fusion Residual Memory Network for Multimodal Human Sentiment Comprehension

被引:40
作者
Mai, Sijie [1 ]
Hu, Haifeng [1 ]
Xu, Jia [1 ]
Xing, Songlong [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510275, Guangdong, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Sentiment analysis; emotion intensity attention; time-step level fusion; residual memory network; REPRESENTATIONS; SPEECH;
D O I
10.1109/TAFFC.2020.3000510
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal human sentiment comprehension refers to recognizing human affection from multiple modalities. There exist two key issues for this problem. First, it is difficult to explore time-dependent interactions between modalities and focus on the important time steps. Second, processing the long fused sequence of utterances is susceptible to the forgetting problem due to the long-term temporal dependency. In this article, we introduce a hierarchical learning architecture to classify utterance-level sentiment. To address the first issue, we perform time-step level fusion to generate fused features for each time step, which explicitly models time-restricted interactions by incorporating information across modalities at the same time step. Furthermore, based on the assumption that acoustic features directly reflect emotional intensity, we pioneer emotion intensity attention to focus on the time steps where emotion changes or intense affections take place. To handle the second issue, we propose Residual Memory Network (RMN) to process the fused sequence. RMN utilizes some techniques such as directly passing the previous state into the next time step, which helps to retain the information from many time steps ago. We show that our method achieves state-of-the-art performance on multiple datasets. Results also suggest that RMN yields competitive performance on sequence modeling tasks.
引用
收藏
页码:320 / 334
页数:15
相关论文
共 76 条
  • [1] Abburi H., 2016, Int. Conf. on Mining Intell. and Knowl. Exploration, P58
  • [2] [Anonymous], 2015, ADV NEURAL INFORM PR
  • [3] [Anonymous], 2011, P 13 INT C MULT INT
  • [4] Bai S., 2018, ARXIV180301271
  • [5] Bai S., 2019, INT C LEARN REPR, V103, P1
  • [6] Multimodal Machine Learning: A Survey and Taxonomy
    Baltrusaitis, Tadas
    Ahuja, Chaitanya
    Morency, Louis-Philippe
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) : 423 - 443
  • [7] Barezi EJ, 2019, 4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), P260
  • [8] LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT
    BENGIO, Y
    SIMARD, P
    FRASCONI, P
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02): : 157 - 166
  • [9] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [10] Chauhan DS, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P5647