Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion

被引：143

作者：

Wang, Yang ^{[1
,2
]}

机构：

[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China

[2] Hefei Univ Technol, Intelligent Interconnected Syst Lab Anhui Prov, Hefei, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2021年 / 17卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Multi-modal data; deep neural networks; MULTIVIEW; REPRESENTATIONS; RECOGNITION; NETWORK;

D O I：

10.1145/3408317

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each other. This fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects. Most of the existing state-of-the-arts focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal. Recently, deep neural networks have been exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data, so naturally does for multi-modal data. Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces. In this article, we provide a substantial overview of the existing state-of-the-arts in the field of multi-modal data analytics from shallow to deep spaces. Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition, and fusion over multi-modal spaces. Finally, we share our viewpoints regarding some future directions in this field.

引用

页数：25

共 175 条

[81]

Li ZY, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2952

[82] Contactless and partial 3D fingerprint recognition using multi-view deep representation [J].

Lin, Chenhao ;

Kumar, Ajay .

PATTERN RECOGNITION, 2018, 83 :314-327

[83] Microsoft COCO: Common Objects in Context [J].

Lin, Tsung-Yi ;

Maire, Michael ;

Belongie, Serge ;

Hays, James ;

Perona, Pietro ;

Ramanan, Deva ;

Dollar, Piotr ;

Zitnick, C. Lawrence .

COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755

[84] Deep Variational and Structural Hashing [J].

Liong, Venice Erin ;

Lu, Jiwen ;

Duan, Ling-Yu ;

Tan, Yap-Peng .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (03) :580-595

[85] Cross-Modal Deep Variational Hashing [J].

Liong, Venice Erin ;

Lu, Jiwen ;

Tan, Yap-Peng ;

Zhou, Jie .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4097-4105

[86] MTFH: A Matrix Tri-Factorization Hashing Framework for Efficient Cross-Modal Retrieval [J].

Liu, Xin ;

Hu, Zhikai ;

Ling, Haibin ;

Cheung, Yiu-Ming .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (03) :964-981

[87] Late Fusion Incomplete Multi-View Clustering [J].

Liu, Xinwang ;

Zhu, Xinzhong ;

Li, Miaomiao ;

Wang, Lei ;

Tang, Chang ;

Yin, Jianping ;

Shen, Dinggang ;

Wang, Huaimin ;

Gao, Wen .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (10) :2410-2423

[88]

Liu XW, 2017, AAAI CONF ARTIF INTE, P2259

[89]

Liu XW, 2019, AAAI CONF ARTIF INTE, P4400

[90]

Luo Yi, 2018, P 32 AAAI C ART INT

← 4 5 6 7 8 9 10 11 12 13 →