Ensemble learning on visual and textual data for social image emotion classification

被引:37
作者
Corchs, Silvia [1 ]
Fersini, Elisabetta [1 ]
Gasparini, Francesca [1 ]
机构
[1] Univ Milano Bicocca, Dept Informat Syst & Commun, Viale Sarca 336, I-20126 Milan, Italy
关键词
Image emotion; Multimodal ensemble learning; Bayesian model averaging; Visual and textual social data; NETWORKS;
D O I
10.1007/s13042-017-0734-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Texts, images and other information are posted everyday on the social network and provides a large amount of multimodal data. The aim of this work is to investigate if combining and integrating both visual and textual data permits to identify emotions elicited by an image. We focus on image emotion classification within eight emotion categories: amusement, awe, contentment, excitement, anger, disgust, fear and sadness. Within this classification task we here propose to adopt ensemble learning approaches based on the Bayesian model averaging method, that combine five state-of-the-art classifiers. The proposed ensemble approaches consider predictions given by several classification models, based on visual and textual data, through respectively a late and an early fusion schemes. Our investigations show that an ensemble method based on a late fusion of unimodal classifiers permits to achieve high classification performance within all of the eight emotion classes. The improvement is higher when deep image representations are adopted as visual features, compared with hand-crafted ones.
引用
收藏
页码:2057 / 2070
页数:14
相关论文
共 65 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[3]   Multimodal fusion for multimedia analysis: a survey [J].
Atrey, Pradeep K. ;
Hossain, M. Anwar ;
El Saddik, Abdulmotaleb ;
Kankanhalli, Mohan S. .
MULTIMEDIA SYSTEMS, 2010, 16 (06) :345-379
[4]   Goal-oriented optimal subset selection of correlated multimedia streams [J].
Atrey, Pradeep K. ;
Kankanhalli, Mohan S. ;
Oommen, John B. .
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2007, 3 (01)
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Chen M, 2015, IEEE IMAGE PROC, P4491, DOI 10.1109/ICIP.2015.7351656
[7]   Genetic programming approach to evaluate complexity of texture images [J].
Ciocca, Gianluigi ;
Corchs, Silvia ;
Gasparini, Francesca .
JOURNAL OF ELECTRONIC IMAGING, 2016, 25 (06)
[8]   Mean shift: A robust approach toward feature space analysis [J].
Comaniciu, D ;
Meer, P .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) :603-619
[9]   A BAYESIAN METHOD FOR THE INDUCTION OF PROBABILISTIC NETWORKS FROM DATA [J].
COOPER, GF ;
HERSKOVITS, E .
MACHINE LEARNING, 1992, 9 (04) :309-347
[10]   Predicting Complexity Perception of Real World Images [J].
Corchs, Silvia Elena ;
Ciocca, Gianluigi ;
Bricolo, Emanuela ;
Gasparini, Francesca .
PLOS ONE, 2016, 11 (06)