Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion

被引:133
作者
Wang, Yang [1 ,2 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
[2] Hefei Univ Technol, Intelligent Interconnected Syst Lab Anhui Prov, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal data; deep neural networks; MULTIVIEW; REPRESENTATIONS; RECOGNITION; NETWORK;
D O I
10.1145/3408317
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each other. This fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects. Most of the existing state-of-the-arts focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal. Recently, deep neural networks have been exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data, so naturally does for multi-modal data. Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces. In this article, we provide a substantial overview of the existing state-of-the-arts in the field of multi-modal data analytics from shallow to deep spaces. Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition, and fusion over multi-modal spaces. Finally, we share our viewpoints regarding some future directions in this field.
引用
收藏
页数:25
相关论文
共 175 条
  • [81] Li ZY, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2952
  • [82] Contactless and partial 3D fingerprint recognition using multi-view deep representation
    Lin, Chenhao
    Kumar, Ajay
    [J]. PATTERN RECOGNITION, 2018, 83 : 314 - 327
  • [83] Microsoft COCO: Common Objects in Context
    Lin, Tsung-Yi
    Maire, Michael
    Belongie, Serge
    Hays, James
    Perona, Pietro
    Ramanan, Deva
    Dollar, Piotr
    Zitnick, C. Lawrence
    [J]. COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 : 740 - 755
  • [84] Deep Variational and Structural Hashing
    Liong, Venice Erin
    Lu, Jiwen
    Duan, Ling-Yu
    Tan, Yap-Peng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (03) : 580 - 595
  • [85] Cross-Modal Deep Variational Hashing
    Liong, Venice Erin
    Lu, Jiwen
    Tan, Yap-Peng
    Zhou, Jie
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4097 - 4105
  • [86] MTFH: A Matrix Tri-Factorization Hashing Framework for Efficient Cross-Modal Retrieval
    Liu, Xin
    Hu, Zhikai
    Ling, Haibin
    Cheung, Yiu-Ming
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (03) : 964 - 981
  • [87] Late Fusion Incomplete Multi-View Clustering
    Liu, Xinwang
    Zhu, Xinzhong
    Li, Miaomiao
    Wang, Lei
    Tang, Chang
    Yin, Jianping
    Shen, Dinggang
    Wang, Huaimin
    Gao, Wen
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (10) : 2410 - 2423
  • [88] Liu XW, 2017, AAAI CONF ARTIF INTE, P2259
  • [89] Liu XW, 2019, AAAI CONF ARTIF INTE, P4400
  • [90] Luo Yi, 2018, P 32 AAAI C ART INT