Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion

被引:143
作者
Wang, Yang [1 ,2 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
[2] Hefei Univ Technol, Intelligent Interconnected Syst Lab Anhui Prov, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal data; deep neural networks; MULTIVIEW; REPRESENTATIONS; RECOGNITION; NETWORK;
D O I
10.1145/3408317
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each other. This fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects. Most of the existing state-of-the-arts focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal. Recently, deep neural networks have been exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data, so naturally does for multi-modal data. Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces. In this article, we provide a substantial overview of the existing state-of-the-arts in the field of multi-modal data analytics from shallow to deep spaces. Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition, and fusion over multi-modal spaces. Finally, we share our viewpoints regarding some future directions in this field.
引用
收藏
页数:25
相关论文
共 175 条
[1]  
Andrienko G., 2013, Introduction, P1
[2]  
[Anonymous], 4 INT C LEARN REPR I
[3]  
[Anonymous], 2014, T ASSOC COMPUT LING
[4]  
[Anonymous], 2018, ABS180110402 CORR
[5]  
[Anonymous], 2017, Proceedings of the Annual Conference on Neural Information Processing Systems 2017, DOI DOI 10.48550/ARXIV.1711.00889
[6]  
[Anonymous], 2020, P AAAI C ART INT
[7]  
[Anonymous], 2014, arXiv
[8]  
[Anonymous], 2009, ACM INT C IM VID RET
[9]   Cross-Modal Scene Networks [J].
Aytar, Yusuf ;
Castrejon, Lluis ;
Vondrick, Carl ;
Pirsiavash, Hamed ;
Torralba, Antonio .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (10) :2303-2314
[10]   Visual Feature Attribution using Wasserstein GANs [J].
Baumgartner, Christian F. ;
Koch, Lisa M. ;
Tezcan, Kerem Can ;
Ang, Jia Xi ;
Konukoglu, Ender .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8309-8319