A multiscale neural architecture search framework for multimodal fusion

被引:0
作者
Lv, Jindi [1 ]
Sun, Yanan [1 ]
Ye, Qing [1 ]
Feng, Wentao [1 ]
Lv, Jiancheng [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, 24 South Sect 1,Yihuan Rd, Chengdu 610064, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep neural network; Multimodal fusion; Neural architecture search; DARTS;
D O I
10.1016/j.ins.2024.121005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal fusion, a machine learning technique, significantly enhances decision -making by leveraging complementary information extracted from different data modalities. The success of multimodal fusion relies heavily on the design of the fusion scheme. However, this process traditionally depends on manual expertise and exhaustive trials. To tackle this challenge, researchers have undertaken studies on DARTS -based Neural Architecture Search (NAS) variants to automate the search of fusion schemes. In this paper, we present theoretical and empirical evidence that highlights the presence of catastrophic search bias in DARTS -based multimodal fusion methods. This bias traps the search into a deceptive optimal childnet, rendering the entire search process ineffective. To circumvent this phenomenon, we introduce a novel NAS framework for multimodal fusion, featuring a robust search strategy and a meticulously designed multi -scale fusion search space. Significantly, the proposed framework is capable of capturing modalityspecific information across multiple scales while achieving an automatic balance between intramodal and inter -modal information. We conduct extensive experiments on three commonly used multimodal classification tasks from different domains and compare the proposed framework against state-of-the-art approaches. The experimental results demonstrate the superior robustness and high efficiency of the proposed framework.
引用
收藏
页数:16
相关论文
共 49 条
  • [1] Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition with Multimodal Training
    Abavisani, Mahdi
    Joze, Hamid Reza Vaezi
    Patel, Vishal M.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1165 - 1174
  • [2] [Anonymous], INT C MACHINE LEARNI
  • [3] Arevalo J, 2017, Arxiv, DOI arXiv:1702.01992
  • [4] Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points
    Baradel, Fabien
    Wolf, Christian
    Mille, Julien
    Taylor, Graham W.
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 469 - 478
  • [5] Cai H, 2019, Arxiv, DOI arXiv:1812.00332
  • [6] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [7] BN-NAS: Neural Architecture Search with Batch Normalization
    Chen, Boyu
    Li, Peixia
    Li, Baopu
    Lin, Chen
    Li, Chuming
    Sun, Ming
    Yan, Junjie
    Ouyang, Wanli
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 307 - 316
  • [8] Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation
    Chen, Xin
    Xie, Lingxi
    Wu, Jun
    Tian, Qi
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1294 - 1303
  • [9] Chu XX, 2021, Arxiv, DOI arXiv:2009.01027
  • [10] Dauphin YN, 2017, PR MACH LEARN RES, V70