MFAS: Multimodal Fusion Architecture Search

被引：145

作者：

Perez-Rua, Juan-Manuel ^{[1
,3
]}

Vielzeuf, Valentin ^{[1
,2
]}

Pateux, Stephane ^{[1
]}

Baccouche, Moez ^{[1
]}

Jurie, Frederic ^{[2
]}

机构：

[1] Orange Labs, Cesson Sevigne, France

[2] Univ Caen Normandie, Caen, France

[3] Samsung AI Ctr, Cambridge, England

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

关键词：

D O I：

10.1109/CVPR.2019.00713

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We tackle the problem of finding good architectures for multimodal classification problems. We propose a novel and generic search space that spans a large number of possible fusion architectures. In order to find an optimal architecture for a given dataset in the proposed search space, we leverage an efficient sequential model-based exploration approach that is tailored for the problem. We demonstrate the value of posing multimodal fusion as a neural architecture search problem by extensive experimentation on a toy dataset and two other real multimodal datasets. We discover fusion architectures that exhibit state-of-the-art performance for problems with different domain and dataset size, including the NTU RGB+D dataset, the largest multimodal action recognition dataset available.

引用

页码：6959 / 6968

页数：10

共 50 条

[11]

[Anonymous], ICLR

[12]

[Anonymous], 2016, IEEE C COMP VIS PATT

[13]

[Anonymous], 2013, ICML

[14]

[Anonymous], 2018, ECCV

[15]

[Anonymous], 2004, ICML

[16]

Arevalo J., 2017, ICLR WORKSH

[17] Multimodal fusion for multimedia analysis: a survey [J].

Atrey, Pradeep K. ;

Hossain, M. Anwar ;

El Saddik, Abdulmotaleb ;

Kankanhalli, Mohan S. .

MULTIMEDIA SYSTEMS, 2010, 16 (06) :345-379

[18]

Baccouche Moez, 2011, Human Behavior Unterstanding. Proceedings Second International Workshop, HBU 2011, P29, DOI 10.1007/978-3-642-25446-8_4

[19]

Baradel F., 2018, CVPR, V3

[20]

Cadene R., 2017, ICCV, V3

← 1 2 3 4 5 →