Relational structure predictive neural architecture search for multimodal fusion

被引：0

作者：

Xiao Yao

Fang Li

Yifeng Zeng

机构：

[1] Hohai University,The College of IoT Engineering

来源：

Soft Computing | 2022年 / 26卷

关键词：

Neural network; Multimodal fusion; Neural architecture search; Semi-supervised strategy; Graph convolution network;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Design strategies of model architecture greatly affect the performance of tasks for multimodal classification. Neural network architectures in traditional models are designed manually, depending on human understanding for specific tasks, and generalization capability is limited. This paper mainly discusses exploring the optimal architecture for multimodal fusion using Neural Architecture Search. Neural architecture search relies on a controller to generate better architectures and predict the accuracy of given architectures. However, the controller evaluation for architectures is very time-consuming. We discuss a semi-supervised strategy for architectures evaluation to reduce the search time complexity; however, the performance degradation for the predictor is caused. A method for relational-graphic-predictive NAS (RGNAS) is therefore presented to compensate the insufficiency of labeled architectures for improving the accuracy of the predictor. RGNAS leverages the intrinsic relationship between labeled architectures and abundant unlabeled architectures to compensate the insufficiency of labeled architectures. A reasonable trade-off between accuracy and the search time complexity is achieved. We validate the effectiveness of the proposed method on different multimodal datasets (eNTERFACE05, AFEW9.0 and MM-IMDb). Extensive experiments demonstrate that our method outperforms the state of the arts and achieves better robustness and generalization performance.

引用

页码：2807 / 2818

页数：11

共 30 条

[1]

Bejani M(2014)Audiovisual emotion recognition using anova feature selection method and multi-classifier neural networks Neural Comput Appl 24 399-412

[2]

Gharavian Davood(1996)Bagging predicators Mach Learn 24 123-140

[3]

Breiman L(1988)Statistical power analysis for the behavioral science Technometrics 31 499-500

[4]

Cohen J(2018)Synchronization of memristive neural networks with mixed delays via quantized intermittent control Appl Math Comput 339 874-887

[5]

Cohen JW(2017)Contextual region-based convolutional neural network with multilayer fusion for sar ship detection Remote Sens 9 860-297

[6]

Cohen J(2010)Multimodal information fusion application to human emotion recognition from face and speech Multimedia Tools Appl 49 277-1706

[7]

Cohen J(2014)Moddrop: adaptive multi-modal gesture recognition IEEE Transactions Pattern Anal Mach Intell 38 1692-454

[8]

Cohen J(2017)Ff-skpcca: kernel probabilistic canonical correlation analysis Appl Intell 46 438-3043

[9]

Cohen J(2017)Learning affective features with a hybrid deep model for audio-visual emotion recognition IEEE Trans Circuit Syst Video Technol 28 3030-undefined

[10]

Feng Y(undefined)undefined undefined undefined undefined-undefined

← 1 2 3 →