A Survey of Multimodal Learning: Methods, Applications, and Future

被引:0
|
作者
Yuan, Yuan [1 ]
Li, Zhaojian [1 ]
Zhao, Bin [1 ]
机构
[1] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal; cross-modal; audio-visual learning; text-visual; touch-visual; VISUAL-TACTILE FUSION; VIDEO HIGHLIGHT DETECTION; RECOGNITION; REPRESENTATION; NETWORK; GRASP; MODEL; CNN;
D O I
10.1145/3713070
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The multimodal interplay of the five fundamental senses-Sight, Hearing, Smell, Taste, and Touch-provides humans with superior environmental perception and learning skills. Adapted from the human perceptual system, multimodal machine learning tries to incorporate different forms of input, such as image, audio, and text, and determine their fundamental connections through joint modeling. As one of the future development forms of artificial intelligence, it is necessary to summarize the progress of multimodal machine learning. In this article, we start with the form of a multimodal combination and provide a comprehensive survey of the emerging subject of multimodal machine learning, covering representative research approaches, the most recent advancements, and their applications. Specifically, this article analyzes the relationship between different modalities in detail and sorts out the key issues in multimodal research from the application scenarios. Besides, we thoroughly reviewed state-of-the-art methods and datasets covered in multimodal learning research. We then identify the substantial challenges and potential developing directions in this field. Finally, given the comprehensive nature of this survey, both modality-specific and task-specific researchers can benefit from this survey and advance the field.
引用
收藏
页数:34
相关论文
共 50 条
  • [31] A Survey on Self-Supervised Learning: Algorithms, Applications, and Future Trends
    Gui, Jie
    Chen, Tuo
    Zhang, Jing
    Cao, Qiong
    Sun, Zhenan
    Luo, Hao
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 9052 - 9071
  • [32] Applications of Machine Learning in Networking: A Survey of Current Issues and Future Challenges
    Ridwan, M. A.
    Radzi, N. A. M.
    Abdullah, F.
    Jalil, Y. E.
    IEEE ACCESS, 2021, 9 : 52523 - 52556
  • [33] Multimodal Learning Analytics - Enabling the Future of Learning through Multimodal Data Analysis and Interfaces
    Worsley, Marcelo
    ICMI '12: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2012, : 353 - 356
  • [34] Network Models in Neuroimaging: A Survey of Multimodal Applications
    Mancini, Matteo
    Cercignani, Mara
    FUNDAMENTA INFORMATICAE, 2018, 163 (01) : 63 - 91
  • [35] Challenges in Deep Learning for Multimodal Applications
    Ghosh, Sayan
    ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 611 - 615
  • [36] A Comprehensive Survey on Split-Fed Learning: Methods, Innovations, and Future Directions
    Hukkeri, Geetabai S.
    Goudar, R. H.
    Dhananjaya, G. M.
    Rathod, Vijayalaxmi N.
    Ankalaki, Shilpa
    IEEE ACCESS, 2025, 13 : 46312 - 46333
  • [37] Quantum Machine Learning in Disease Detection and Prediction: a survey of applications and future possibilities
    Upama, Paramita Basak
    Kolli, Anushka
    Kolli, Hansika
    Alam, Subarna
    Syam, Mohammad
    Shahriar, Hossain
    Ahamed, Sheikh Iqbal
    2023 IEEE 47TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC, 2023, : 1545 - 1551
  • [38] Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey
    Li, Tianxu
    Zhu, Kun
    Nguyen Cong Luong
    Niyato, Dusit
    Wu, Qihui
    Zhang, Yang
    Chen, Bing
    IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2022, 24 (02): : 1240 - 1279
  • [39] A Survey on Edge Intelligence and Lightweight Machine Learning Support for Future Applications and Services
    Hoffpauir, Kyle
    Simmons, Jacob
    Schmidt, Nikolas
    Pittala, Rachitha
    Briggs, Isaac
    Makani, Shanmukha
    Jararweh, Yaser
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2023, 15 (02):
  • [40] A Survey on Deep Learning for Multimodal Data Fusion
    Gao, Jing
    Li, Peng
    Chen, Zhikui
    Zhang, Jianing
    NEURAL COMPUTATION, 2020, 32 (05) : 829 - 864