MFAS: Multimodal Fusion Architecture Search

被引:145
作者
Perez-Rua, Juan-Manuel [1 ,3 ]
Vielzeuf, Valentin [1 ,2 ]
Pateux, Stephane [1 ]
Baccouche, Moez [1 ]
Jurie, Frederic [2 ]
机构
[1] Orange Labs, Cesson Sevigne, France
[2] Univ Caen Normandie, Caen, France
[3] Samsung AI Ctr, Cambridge, England
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00713
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We tackle the problem of finding good architectures for multimodal classification problems. We propose a novel and generic search space that spans a large number of possible fusion architectures. In order to find an optimal architecture for a given dataset in the proposed search space, we leverage an efficient sequential model-based exploration approach that is tailored for the problem. We demonstrate the value of posing multimodal fusion as a neural architecture search problem by extensive experimentation on a toy dataset and two other real multimodal datasets. We discover fusion architectures that exhibit state-of-the-art performance for problems with different domain and dataset size, including the NTU RGB+D dataset, the largest multimodal action recognition dataset available.
引用
收藏
页码:6959 / 6968
页数:10
相关论文
共 50 条
[11]  
[Anonymous], ICLR
[12]  
[Anonymous], 2016, IEEE C COMP VIS PATT
[13]  
[Anonymous], 2013, ICML
[14]  
[Anonymous], 2018, ECCV
[15]  
[Anonymous], 2004, ICML
[16]  
Arevalo J., 2017, ICLR WORKSH
[17]   Multimodal fusion for multimedia analysis: a survey [J].
Atrey, Pradeep K. ;
Hossain, M. Anwar ;
El Saddik, Abdulmotaleb ;
Kankanhalli, Mohan S. .
MULTIMEDIA SYSTEMS, 2010, 16 (06) :345-379
[18]  
Baccouche Moez, 2011, Human Behavior Unterstanding. Proceedings Second International Workshop, HBU 2011, P29, DOI 10.1007/978-3-642-25446-8_4
[19]  
Baradel F., 2018, CVPR, V3
[20]  
Cadene R., 2017, ICCV, V3