Multi-modal data clustering using deep learning: A systematic review

被引：5

作者：

Raya, Sura ^{[1
]}

Orabi, Mariam ^{[1
]}

Afyouni, Imad ^{[1
]}

Al Aghbari, Zaher ^{[1
]}

机构：

[1] Univ Sharjah, Coll Comp & Informat, Sharjah, U Arab Emirates

来源：

NEUROCOMPUTING | 2024年 / 607卷

关键词：

Multi-modal data; Clustering algorithms; Deep learning; Review article; FRAMEWORK; INFORMATION; TRENDS;

D O I：

10.1016/j.neucom.2024.128348

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-modal clustering represents a formidable challenge in the domain of unsupervised learning. The objective of multi-modal clustering is to categorize data collected from diverse modalities, such as audio, visual, and textual sources, into distinct clusters. These clustering techniques operate by extracting shared features across modalities in an unsupervised manner, where the identified common features exhibit high correlations within real-world objects. Recognizing the importance of perceiving the correlated nature of these features is vital for enhancing clustering accuracy in multi-modal settings. This survey explores Deep Learning (DL) techniques applied to multi-modal clustering, encompassing methodologies such as Convolutional Neural Networks (CNN), Autoencoders (AE), Recurrent Neural Networks (RNN), and Graph Convolutional Networks (GCN). Notably, this survey represents the first attempt to investigate DL techniques specifically for multi-modal clustering. The survey presents a novel taxonomy for DL-based multi-modal clustering, conducts a comparative analysis of various multi-modal clustering approaches, and deliberates on the datasets employed in the evaluation process. Additionally, the survey identifies research gaps within the realm of multi-modal clustering, offering insights into potential future avenues for research.

引用

页数：17

共 108 条

[1] Self-supervised object detection from audio-visual correspondence [J].

Afouras, Triantafyllos ;

Asano, Yuki M. ;

Fagan, Francois ;

Vedaldi, Andrea ;

Metze, Florian .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :10565-10576

[2]

Alwassel H, 2020, ADV NEUR IN, V33

[3] Use of Multi-Modal Data and Machine Learning to Improve Cardiovascular Disease Care [J].

Amal, Saeed ;

Safarnejad, Lida ;

Omiye, Jesutofunmi A. ;

Ghanzouri, Ilies ;

Cabot, John Hanson ;

Ross, Elsie Gyang .

FRONTIERS IN CARDIOVASCULAR MEDICINE, 2022, 9

[4]

Anguita D., 2013, ESANN, P437

[5]

[Anonymous], 2013, Mocrosoft Research Technical Report MSR-TR-2013-75

[6] Look, Listen and Learn [J].

Arandjelovic, Relja ;

Zisserman, Andrew .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :609-617

[7]

Asano YM, 2020, ADV NEUR IN, V33

[8]

Avellaneda J.A., 2023, 2023 IEEE 19 INT C A, P1, DOI [10.1109/CASE56687.2023.10260493, DOI 10.1109/CASE56687.2023.10260493]

[9] Multimodal Machine Learning: A Survey and Taxonomy [J].

Baltrusaitis, Tadas ;

Ahuja, Chaitanya ;

Morency, Louis-Philippe .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443

[10] Recognizing and Presenting the Storytelling Video Structure With Deep Multimodal Networks [J].

Baraldi, Lorenzo ;

Grana, Costantino ;

Cucchiara, Rita .

IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (05) :955-968

← 1 2 3 4 5 6 7 8 9 10 →