Weighted Linear Fusion of Multimodal Data - A Reasonable Baseline?

被引:7
作者
Arandjelovic, Ognjen [1 ]
机构
[1] Univ St Andrews, Sch Comp Sci, St Andrews KY16 9SX, Fife, Scotland
来源
MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE | 2016年
关键词
Prediction; arrhythmia; object recognition; computer vision; car accident; ILLUMINATION; IMAGES;
D O I
10.1145/2964284.2964304
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ever-increasing demand for reliable inference capable of handling unpredictable challenges of practical application in the real world, has made research on information fusion of major importance. There are few fields of application and research where this is more evident than in the sphere of multimedia which by its very nature inherently involves the use of multiple modalities, be it for learning, prediction, or human-computer interaction, say. In the development of the most common type, score-level fusion algorithms, it is virtually without an exception desirable to have as a reference starting point a simple and universally sound baseline benchmark which newly developed approaches can be compared to. One of the most pervasively used methods is that of weighted linear fusion. It has cemented itself as the default off-the-shelf baseline owing to its simplicity of implementation, interpretability, and surprisingly competitive performance across a wide range of application domains and information source types. In this paper I argue that despite this track record, weighted linear fusion is not a good baseline on the grounds that there is an equally simple and interpretable alternative - namely quadratic mean-based fusion - which is theoretically more principled and which is more successful in practice. I argue the former from first principles and demonstrate the latter using a series of experiments on a diverse set of fusion problems: computer vision-based object recognition, arrhythmia detection, and fatality prediction in motor vehicle accidents.
引用
收藏
页码:851 / 857
页数:7
相关论文
共 20 条
[1]  
Aggarwal G., 2002, P EUR C COMP VIS
[2]  
Ahmadyfard A., 2002, P BRIT MACH VIS C, P1
[3]  
[Anonymous], P AUSTR C ROB AUT
[4]  
Arandjelovic O, 2006, PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION - PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE, P449
[5]   Object Matching Using Boundary Descriptors [J].
Arandjelovic, Ognjen .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
[6]   Thermal and reflectance based personal identification methodology under variable illumination [J].
Arandjelovic, Ognjen ;
Hammoud, Riad ;
Cipolla, Roberto .
PATTERN RECOGNITION, 2010, 43 (05) :1801-1813
[7]  
Arandjelovic R, 2011, IEEE I CONF COMP VIS, P375, DOI 10.1109/ICCV.2011.6126265
[8]   What is the set of images of an object under all possible illumination conditions? [J].
Belhumeur, PN ;
Kriegman, DJ .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 1998, 28 (03) :245-260
[9]  
Bishop C.M., 2006, J ELECTRON IMAGING, V16, P049901, DOI DOI 10.1117/1.2819119
[10]  
Ghiass R.S., 2016, P 25 INT JOINT C ART, P3368