A Survey of Multimodal Learning: Methods, Applications, and Future

被引:0
|
作者
Yuan, Yuan [1 ]
Li, Zhaojian [1 ]
Zhao, Bin [1 ]
机构
[1] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal; cross-modal; audio-visual learning; text-visual; touch-visual; VISUAL-TACTILE FUSION; VIDEO HIGHLIGHT DETECTION; RECOGNITION; REPRESENTATION; NETWORK; GRASP; MODEL; CNN;
D O I
10.1145/3713070
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The multimodal interplay of the five fundamental senses-Sight, Hearing, Smell, Taste, and Touch-provides humans with superior environmental perception and learning skills. Adapted from the human perceptual system, multimodal machine learning tries to incorporate different forms of input, such as image, audio, and text, and determine their fundamental connections through joint modeling. As one of the future development forms of artificial intelligence, it is necessary to summarize the progress of multimodal machine learning. In this article, we start with the form of a multimodal combination and provide a comprehensive survey of the emerging subject of multimodal machine learning, covering representative research approaches, the most recent advancements, and their applications. Specifically, this article analyzes the relationship between different modalities in detail and sorts out the key issues in multimodal research from the application scenarios. Besides, we thoroughly reviewed state-of-the-art methods and datasets covered in multimodal learning research. We then identify the substantial challenges and potential developing directions in this field. Finally, given the comprehensive nature of this survey, both modality-specific and task-specific researchers can benefit from this survey and advance the field.
引用
收藏
页数:34
相关论文
共 50 条
  • [21] Federated Learning for Predictive Maintenance: A Survey of Methods, Applications, and Challenges
    Purkayastha, Arnab A.
    Aggarwal, Shobhit
    2024 IEEE 67TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, MWSCAS 2024, 2024, : 238 - 242
  • [22] A comprehensive survey of federated transfer learning: challenges, methods and applications
    GUO Wei
    ZHUANG Fuzhen
    ZHANG Xiao
    TONG Yiqi
    DONG Jin
    Frontiers of Computer Science, 2024, 18 (06)
  • [23] Multimodal Aspect-Based Sentiment Analysis: A survey of tasks, methods, challenges and future directions
    Zhao, Tianyu
    Meng, Ling-ang
    Song, Dawei
    INFORMATION FUSION, 2024, 112
  • [24] Deep Multimodal Representation Learning: A Survey
    Guo, Wenzhong
    Wang, Jianwen
    Wang, Shiping
    IEEE ACCESS, 2019, 7 : 63373 - 63394
  • [26] Multimodal Co-learning: Challenges, applications with datasets, recent advances and future directions
    Rahate, Anil
    Walambe, Rahee
    Ramanna, Sheela
    Kotecha, Ketan
    INFORMATION FUSION, 2022, 81 : 203 - 239
  • [27] Multimodal Machine Learning: A Survey and Taxonomy
    Baltrusaitis, Tadas
    Ahuja, Chaitanya
    Morency, Louis-Philippe
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) : 423 - 443
  • [28] A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions
    Barua, Arnab
    Ahmed, Mobyen Uddin
    Begum, Shahina
    IEEE ACCESS, 2023, 11 : 14804 - 14831
  • [29] How can Big Data and machine learning benefit environment and water management: a survey of methods, applications, and future directions
    Sun, Alexander Y.
    Scanlon, Bridget R.
    ENVIRONMENTAL RESEARCH LETTERS, 2019, 14 (07)
  • [30] A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends
    Younesi, Abolfazl
    Ansari, Mohsen
    Fazli, Mohammadamin
    Ejlali, Alireza
    Shafique, Muhammad
    Henkel, Jorg
    IEEE ACCESS, 2024, 12 : 41180 - 41218