A multiscale neural architecture search framework for multimodal fusion

被引：0

作者：

Lv, Jindi ^{[1
]}

Sun, Yanan ^{[1
]}

Ye, Qing ^{[1
]}

Feng, Wentao ^{[1
]}

Lv, Jiancheng ^{[1
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, 24 South Sect 1,Yihuan Rd, Chengdu 610064, Sichuan, Peoples R China

来源：

INFORMATION SCIENCES | 2024年 / 679卷

基金：

中国国家自然科学基金;

关键词：

Deep neural network; Multimodal fusion; Neural architecture search; DARTS;

D O I：

10.1016/j.ins.2024.121005

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multimodal fusion, a machine learning technique, significantly enhances decision -making by leveraging complementary information extracted from different data modalities. The success of multimodal fusion relies heavily on the design of the fusion scheme. However, this process traditionally depends on manual expertise and exhaustive trials. To tackle this challenge, researchers have undertaken studies on DARTS -based Neural Architecture Search (NAS) variants to automate the search of fusion schemes. In this paper, we present theoretical and empirical evidence that highlights the presence of catastrophic search bias in DARTS -based multimodal fusion methods. This bias traps the search into a deceptive optimal childnet, rendering the entire search process ineffective. To circumvent this phenomenon, we introduce a novel NAS framework for multimodal fusion, featuring a robust search strategy and a meticulously designed multi -scale fusion search space. Significantly, the proposed framework is capable of capturing modalityspecific information across multiple scales while achieving an automatic balance between intramodal and inter -modal information. We conduct extensive experiments on three commonly used multimodal classification tasks from different domains and compare the proposed framework against state-of-the-art approaches. The experimental results demonstrate the superior robustness and high efficiency of the proposed framework.

引用

页数：16

共 49 条

[1] Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition with Multimodal Training
Abavisani, Mahdi
Joze, Hamid Reza Vaezi
Patel, Vishal M.
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1165 - 1174
[2] [Anonymous], INT C MACHINE LEARNI
[3] Arevalo J, 2017, Arxiv, DOI arXiv:1702.01992
[4] Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points
Baradel, Fabien
Wolf, Christian
Mille, Julien
Taylor, Graham W.
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 469 - 478
[5] Cai H, 2019, Arxiv, DOI arXiv:1812.00332
[6] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[7] BN-NAS: Neural Architecture Search with Batch Normalization
Chen, Boyu
Li, Peixia
Li, Baopu
Lin, Chen
Li, Chuming
Sun, Ming
Yan, Junjie
Ouyang, Wanli
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 307 - 316
[8] Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation
Chen, Xin
Xie, Lingxi
Wu, Jun
Tian, Qi
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1294 - 1303
[9] Chu XX, 2021, Arxiv, DOI arXiv:2009.01027
[10] Dauphin YN, 2017, PR MACH LEARN RES, V70

← 1 2 3 4 5 →