Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion

被引：133

作者：

Wang, Yang ^{[1
,2
]}

机构：

[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China

[2] Hefei Univ Technol, Intelligent Interconnected Syst Lab Anhui Prov, Hefei, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2021年 / 17卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Multi-modal data; deep neural networks; MULTIVIEW; REPRESENTATIONS; RECOGNITION; NETWORK;

D O I：

10.1145/3408317

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each other. This fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects. Most of the existing state-of-the-arts focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal. Recently, deep neural networks have been exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data, so naturally does for multi-modal data. Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces. In this article, we provide a substantial overview of the existing state-of-the-arts in the field of multi-modal data analytics from shallow to deep spaces. Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition, and fusion over multi-modal spaces. Finally, we share our viewpoints regarding some future directions in this field.

引用

页数：25

共 175 条

[21] Chen ZD, 2018, AAAI CONF ARTIF INTE, P274
[22] Chi JJ, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2165
[23] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Choi, Yunjey
Choi, Minje
Kim, Munyoung
Ha, Jung-Woo
Kim, Sunghun
Choo, Jaegul
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8789 - 8797
[24] Chua TS, 2009, P ACM INT C IM VID R, P1, DOI 10.1145/1646396.1646452
[25] On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval
Costa Pereira, Jose
Coviello, Emanuele
Doyle, Gabriel
Rasiwasia, Nikhil
Lanckriet, Gert R. G.
Levy, Roger
Vasconcelos, Nuno
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) : 521 - 535
[26] Triplet-Based Deep Hashing Network for Cross-Modal Retrieval
Deng, Cheng
Chen, Zhaojia
Liu, Xianglong
Gao, Xinbo
Tao, Dacheng
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) : 3893 - 3903
[27] Deng Z., 2017, Proceedings of the Annual Conference on Neural Information Processing Systems 2017, P3899, DOI DOI 10.48550/ARXIV.1711.00889
[28] Robust Face Recognition via Multimodal Deep Face Representation
Ding, Changxing
Tao, Dacheng
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 2049 - 2058
[29] Balanced Self-Paced Learning for Generative Adversarial Clustering Network
Dizaji, Kamran Ghasedi
Wang, Xiaoqian
Deng, Cheng
Huang, Heng
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4386 - 4395
[30] Semi-Supervised Generative Adversarial Network for Gene Expression Inference
Dizaji, Kamran Ghasedi
Wang, Xiaoqian
Huang, Heng
[J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1435 - 1444

← 1 2 3 4 5 6 7 8 9 10 →