A Survey of Multimodal Learning: Methods, Applications, and Future

被引：0

作者：

Yuan, Yuan ^{[1
]}

Li, Zhaojian ^{[1
]}

Zhao, Bin ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian, Peoples R China

来源：

ACM COMPUTING SURVEYS | 2025年 / 57卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Multimodal; cross-modal; audio-visual learning; text-visual; touch-visual; VISUAL-TACTILE FUSION; VIDEO HIGHLIGHT DETECTION; RECOGNITION; REPRESENTATION; NETWORK; GRASP; MODEL; CNN;

D O I：

10.1145/3713070

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The multimodal interplay of the five fundamental senses-Sight, Hearing, Smell, Taste, and Touch-provides humans with superior environmental perception and learning skills. Adapted from the human perceptual system, multimodal machine learning tries to incorporate different forms of input, such as image, audio, and text, and determine their fundamental connections through joint modeling. As one of the future development forms of artificial intelligence, it is necessary to summarize the progress of multimodal machine learning. In this article, we start with the form of a multimodal combination and provide a comprehensive survey of the emerging subject of multimodal machine learning, covering representative research approaches, the most recent advancements, and their applications. Specifically, this article analyzes the relationship between different modalities in detail and sorts out the key issues in multimodal research from the application scenarios. Besides, we thoroughly reviewed state-of-the-art methods and datasets covered in multimodal learning research. We then identify the substantial challenges and potential developing directions in this field. Finally, given the comprehensive nature of this survey, both modality-specific and task-specific researchers can benefit from this survey and advance the field.

引用

页数：34

共 50 条

[1] A Short Survey on Deep Learning for Multimodal Integration: Applications, Future Perspectives and Challenges
Dimitri, Giovanna Maria
COMPUTERS, 2022, 11 (11)
[2] A Review on Methods and Applications in Multimodal Deep Learning
Jabeen, Summaira
Li, Xi
Amin, Muhammad Shoib
Bourahla, Omar
Li, Songyuan
Jabbar, Abdul
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
[3] Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions
Gandhi, Ankita
Adhvaryu, Kinjal
Poria, Soujanya
Cambria, Erik
Hussain, Amir
INFORMATION FUSION, 2023, 91 : 424 - 444
[4] A Survey on Multimodal Deep Learning for Image Synthesis Applications, methods, datasets, evaluation metrics, and results comparison
Luo, Sanbi
2021 5TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE (ICIAI 2021), 2021, : 108 - 120
[5] A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions
Barua, Arnab
Ahmed, Mobyen Uddin
Begum, Shahina
IEEE ACCESS, 2023, 11 : 14804 - 14831
[6] A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges
Bayoudh, Khaled
INFORMATION FUSION, 2024, 105
[7] A Survey on Deep Learning for Multimodal Data Fusion
Gao, Jing
Li, Peng
Chen, Zhikui
Zhang, Jianing
NEURAL COMPUTATION, 2020, 32 (05) : 829 - 864
[8] Machine Learning Methods in Weather and Climate Applications: A Survey
Chen, Liuyi
Han, Bocheng
Wang, Xuesong
Zhao, Jiazhen
Yang, Wenke
Yang, Zhengyi
APPLIED SCIENCES-BASEL, 2023, 13 (21):
[9] A Survey on Deep Learning Methods for Cancer Diagnosis Using Multimodal Data Fusion
M'Sabah, Chems Eddine Louahem
Bouziane, Ahmed
Ferdi, Youcef
2021 INTERNATIONAL CONFERENCE ON E-HEALTH AND BIOENGINEERING (EHB 2021), 9TH EDITION, 2021,
[10] A Comprehensive Survey on Deep Learning Multi-Modal Fusion: Methods, Technologies and Applications
Jiao, Tianzhe
Guo, Chaopeng
Feng, Xiaoyue
Chen, Yuming
Song, Jie
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (01): : 1 - 35

← 1 2 3 4 5 →