Deep Hierarchical Fusion with application in Sentiment Analysis

被引:19
作者
Georgiou, Efthymios [1 ,2 ]
Papaioannou, Charilaos [1 ]
Potamianos, Alexandros [1 ,2 ]
机构
[1] Natl Tech Univ Athens, Sch ECE, Athens, Greece
[2] Behav Signal Technol, Los Angeles, CA 90027 USA
来源
INTERSPEECH 2019 | 2019年
关键词
deep hierarchical fusion; fused representations; multimodal fusion; sentiment analysis; MULTIMODAL FUSION;
D O I
10.21437/Interspeech.2019-3243
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recognizing the emotional tone in spoken language is a challenging research problem that requires modeling not only the acoustic and textual modalities separately but also their cross-interactions. In this work, we introduce a hierarchical fusion scheme for sentiment analysis of spoken sentences. Two bidirectional Long-Short-Term-Memory networks (BiLSTM), followed by multiple fully connected layers, are trained in order to extract feature representations for each of the textual and audio modalities. The representations of the unimodal encoders are both fused at each layer and propagated forward, thus achieving fusion at the word, sentence and high/sentiment levels. The proposed approach of deep hierarchical fusion achieves state-of-the-art results for sentiment analysis tasks. Through an ablation study, we show that the proposed fusion method achieves greater performance gains over the unimodal baseline compared to other fusion approaches in the literature.
引用
收藏
页码:1646 / 1650
页数:5
相关论文
共 29 条
[1]   Multimodal fusion for multimedia analysis: a survey [J].
Atrey, Pradeep K. ;
Hossain, M. Anwar ;
El Saddik, Abdulmotaleb ;
Kankanhalli, Mohan S. .
MULTIMEDIA SYSTEMS, 2010, 16 (06) :345-379
[2]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[3]   Multimodal Machine Learning: A Survey and Taxonomy [J].
Baltrusaitis, Tadas ;
Ahuja, Chaitanya ;
Morency, Louis-Philippe .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443
[4]  
Bengio Y, 2013, INT CONF ACOUST SPEE, P8624, DOI 10.1109/ICASSP.2013.6639349
[5]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[6]  
Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)
[7]  
Chen M., 2017, P 19 ACM INT C MULTI, P163, DOI [10.1145/3136755.3136801, DOI 10.1145/3136755.3136801]
[8]   A Review and Meta-Analysis of Multimodal Affect Detection Systems [J].
D'Mello, Sidney K. ;
Kory, Jacqueline .
ACM COMPUTING SURVEYS, 2015, 47 (03)
[9]  
Degottex G, 2014, INT CONF ACOUST SPEE, DOI 10.1109/ICASSP.2014.6853739
[10]   On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues [J].
Eyben, Florian ;
Woellmer, Martin ;
Graves, Alex ;
Schuller, Bjoern ;
Douglas-Cowie, Ellen ;
Cowie, Roddy .
JOURNAL ON MULTIMODAL USER INTERFACES, 2010, 3 (1-2) :7-19