MFAS: Multimodal Fusion Architecture Search

被引：145

作者：

Perez-Rua, Juan-Manuel ^{[1
,3
]}

Vielzeuf, Valentin ^{[1
,2
]}

Pateux, Stephane ^{[1
]}

Baccouche, Moez ^{[1
]}

Jurie, Frederic ^{[2
]}

机构：

[1] Orange Labs, Cesson Sevigne, France

[2] Univ Caen Normandie, Caen, France

[3] Samsung AI Ctr, Cambridge, England

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

关键词：

D O I：

10.1109/CVPR.2019.00713

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We tackle the problem of finding good architectures for multimodal classification problems. We propose a novel and generic search space that spans a large number of possible fusion architectures. In order to find an optimal architecture for a given dataset in the proposed search space, we leverage an efficient sequential model-based exploration approach that is tailored for the problem. We demonstrate the value of posing multimodal fusion as a neural architecture search problem by extensive experimentation on a toy dataset and two other real multimodal datasets. We discover fusion architectures that exhibit state-of-the-art performance for problems with different domain and dataset size, including the NTU RGB+D dataset, the largest multimodal action recognition dataset available.

引用

页码：6959 / 6968

页数：10