Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion

被引:133
作者
Wang, Yang [1 ,2 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
[2] Hefei Univ Technol, Intelligent Interconnected Syst Lab Anhui Prov, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal data; deep neural networks; MULTIVIEW; REPRESENTATIONS; RECOGNITION; NETWORK;
D O I
10.1145/3408317
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each other. This fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects. Most of the existing state-of-the-arts focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal. Recently, deep neural networks have been exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data, so naturally does for multi-modal data. Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces. In this article, we provide a substantial overview of the existing state-of-the-arts in the field of multi-modal data analytics from shallow to deep spaces. Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition, and fusion over multi-modal spaces. Finally, we share our viewpoints regarding some future directions in this field.
引用
收藏
页数:25
相关论文
共 175 条
  • [21] Chen ZD, 2018, AAAI CONF ARTIF INTE, P274
  • [22] Chi JJ, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2165
  • [23] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
    Choi, Yunjey
    Choi, Minje
    Kim, Munyoung
    Ha, Jung-Woo
    Kim, Sunghun
    Choo, Jaegul
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8789 - 8797
  • [24] Chua TS, 2009, P ACM INT C IM VID R, P1, DOI 10.1145/1646396.1646452
  • [25] On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval
    Costa Pereira, Jose
    Coviello, Emanuele
    Doyle, Gabriel
    Rasiwasia, Nikhil
    Lanckriet, Gert R. G.
    Levy, Roger
    Vasconcelos, Nuno
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) : 521 - 535
  • [26] Triplet-Based Deep Hashing Network for Cross-Modal Retrieval
    Deng, Cheng
    Chen, Zhaojia
    Liu, Xianglong
    Gao, Xinbo
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) : 3893 - 3903
  • [27] Deng Z., 2017, Proceedings of the Annual Conference on Neural Information Processing Systems 2017, P3899, DOI DOI 10.48550/ARXIV.1711.00889
  • [28] Robust Face Recognition via Multimodal Deep Face Representation
    Ding, Changxing
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 2049 - 2058
  • [29] Balanced Self-Paced Learning for Generative Adversarial Clustering Network
    Dizaji, Kamran Ghasedi
    Wang, Xiaoqian
    Deng, Cheng
    Huang, Heng
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4386 - 4395
  • [30] Semi-Supervised Generative Adversarial Network for Gene Expression Inference
    Dizaji, Kamran Ghasedi
    Wang, Xiaoqian
    Huang, Heng
    [J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1435 - 1444