A Survey of Multimodal Learning: Methods, Applications, and Future

被引:0
|
作者
Yuan, Yuan [1 ]
Li, Zhaojian [1 ]
Zhao, Bin [1 ]
机构
[1] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal; cross-modal; audio-visual learning; text-visual; touch-visual; VISUAL-TACTILE FUSION; VIDEO HIGHLIGHT DETECTION; RECOGNITION; REPRESENTATION; NETWORK; GRASP; MODEL; CNN;
D O I
10.1145/3713070
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The multimodal interplay of the five fundamental senses-Sight, Hearing, Smell, Taste, and Touch-provides humans with superior environmental perception and learning skills. Adapted from the human perceptual system, multimodal machine learning tries to incorporate different forms of input, such as image, audio, and text, and determine their fundamental connections through joint modeling. As one of the future development forms of artificial intelligence, it is necessary to summarize the progress of multimodal machine learning. In this article, we start with the form of a multimodal combination and provide a comprehensive survey of the emerging subject of multimodal machine learning, covering representative research approaches, the most recent advancements, and their applications. Specifically, this article analyzes the relationship between different modalities in detail and sorts out the key issues in multimodal research from the application scenarios. Besides, we thoroughly reviewed state-of-the-art methods and datasets covered in multimodal learning research. We then identify the substantial challenges and potential developing directions in this field. Finally, given the comprehensive nature of this survey, both modality-specific and task-specific researchers can benefit from this survey and advance the field.
引用
收藏
页数:34
相关论文
共 50 条
  • [1] Multimodal federated learning: Concept, methods, applications and future directions
    Huang, Wei
    Wang, Dexian
    Ouyang, Xiaocao
    Wan, Jihong
    Liu, Jia
    Li, Tianrui
    INFORMATION FUSION, 2024, 112
  • [2] A Short Survey on Deep Learning for Multimodal Integration: Applications, Future Perspectives and Challenges
    Dimitri, Giovanna Maria
    COMPUTERS, 2022, 11 (11)
  • [3] A Review on Methods and Applications in Multimodal Deep Learning
    Jabeen, Summaira
    Li, Xi
    Amin, Muhammad Shoib
    Bourahla, Omar
    Li, Songyuan
    Jabbar, Abdul
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
  • [4] A survey of multimodal federated learning: background, applications, and perspectives
    Pan, Hao
    Zhao, Xiaoli
    He, Lipeng
    Shi, Yicong
    Lin, Xiaogang
    MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [5] Survey on Learning-Based Formal Methods: Taxonomy, Applications and Possible Future Directions
    Wang, Fujun
    Cao, Zining
    Tan, Lixing
    Zong, Hui
    IEEE ACCESS, 2020, 8 : 108561 - 108578
  • [6] A Survey on Multimodal Deep Learning for Image Synthesis Applications, methods, datasets, evaluation metrics, and results comparison
    Luo, Sanbi
    2021 5TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE (ICIAI 2021), 2021, : 108 - 120
  • [7] Survey of Machine Learning Methods for Big Data Applications
    Vinothini, A.
    Priya, S. Baghavathi
    2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS), 2017,
  • [8] Machine Learning Methods in Weather and Climate Applications: A Survey
    Chen, Liuyi
    Han, Bocheng
    Wang, Xuesong
    Zhao, Jiazhen
    Yang, Wenke
    Yang, Zhengyi
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [9] Multimodal Federated Learning: A Survey
    Che, Liwei
    Wang, Jiaqi
    Zhou, Yao
    Ma, Fenglong
    SENSORS, 2023, 23 (15)
  • [10] Multimodal Learning With Transformers: A Survey
    Xu, Peng
    Zhu, Xiatian
    Clifton, David A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12113 - 12132