Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion

被引：133

作者：

Wang, Yang ^{[1
,2
]}

机构：

[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China

[2] Hefei Univ Technol, Intelligent Interconnected Syst Lab Anhui Prov, Hefei, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2021年 / 17卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Multi-modal data; deep neural networks; MULTIVIEW; REPRESENTATIONS; RECOGNITION; NETWORK;

D O I：

10.1145/3408317

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each other. This fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects. Most of the existing state-of-the-arts focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal. Recently, deep neural networks have been exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data, so naturally does for multi-modal data. Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces. In this article, we provide a substantial overview of the existing state-of-the-arts in the field of multi-modal data analytics from shallow to deep spaces. Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition, and fusion over multi-modal spaces. Finally, we share our viewpoints regarding some future directions in this field.

引用

页数：25

共 175 条

[71] Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval
Li, Chao
Deng, Cheng
Li, Ning
Liu, Wei
Gao, Xinbo
Tao, Dacheng
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4242 - 4251
[72] Li CX, 2017, ADV NEUR IN, V30
[73] Semi-supervised cross-modal image generation with generative adversarial networks
Li, Dan
Du, Changde
He, Huiguang
[J]. PATTERN RECOGNITION, 2020, 100
[74] Li JX, 2018, AAAI CONF ARTIF INTE, P3498
[75] Single Image Dehazing via Conditional Generative Adversarial Network
Li, Runde
Pan, Jinshan
Li, Zechao
Tang, Jinhui
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8202 - 8211
[76] Person Search with Natural Language Description
Li, Shuang
Xiao, Tong
Li, Hongsheng
Zhou, Bolei
Yue, Dayu
Wang, Xiaogang
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5187 - 5196
[77] DeepReID: Deep Filter Pairing Neural Network for Person Re-Identification
Li, Wei
Zhao, Rui
Xiao, Tong
Wang, Xiaogang
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 152 - 159
[78] Li XM, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1373
[79] Deep Binary Reconstruction for Cross-modal Hashing
Li, Xuelong
Hu, Di
Nie, Feiping
[J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1398 - 1406
[80] Li Xuelong, 2019, IEEE T CYBERNET

← 3 4 5 6 7 8 9 10 11 12 →