A Survey of Multimodal Learning: Methods, Applications, and Future

被引:0
|
作者
Yuan, Yuan [1 ]
Li, Zhaojian [1 ]
Zhao, Bin [1 ]
机构
[1] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal; cross-modal; audio-visual learning; text-visual; touch-visual; VISUAL-TACTILE FUSION; VIDEO HIGHLIGHT DETECTION; RECOGNITION; REPRESENTATION; NETWORK; GRASP; MODEL; CNN;
D O I
10.1145/3713070
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The multimodal interplay of the five fundamental senses-Sight, Hearing, Smell, Taste, and Touch-provides humans with superior environmental perception and learning skills. Adapted from the human perceptual system, multimodal machine learning tries to incorporate different forms of input, such as image, audio, and text, and determine their fundamental connections through joint modeling. As one of the future development forms of artificial intelligence, it is necessary to summarize the progress of multimodal machine learning. In this article, we start with the form of a multimodal combination and provide a comprehensive survey of the emerging subject of multimodal machine learning, covering representative research approaches, the most recent advancements, and their applications. Specifically, this article analyzes the relationship between different modalities in detail and sorts out the key issues in multimodal research from the application scenarios. Besides, we thoroughly reviewed state-of-the-art methods and datasets covered in multimodal learning research. We then identify the substantial challenges and potential developing directions in this field. Finally, given the comprehensive nature of this survey, both modality-specific and task-specific researchers can benefit from this survey and advance the field.
引用
收藏
页数:34
相关论文
共 50 条
  • [41] Survey of Research on Deep Multimodal Representation Learning
    Pan, Mengzhu
    Li, Qianmu
    Qiu, Tian
    Computer Engineering and Applications, 2024, 59 (02) : 48 - 64
  • [42] Federated Learning on Multimodal Data: A Comprehensive Survey
    Yi-Ming Lin
    Yuan Gao
    Mao-Guo Gong
    Si-Jia Zhang
    Yuan-Qiao Zhang
    Zhi-Yuan Li
    Machine Intelligence Research, 2023, 20 : 539 - 553
  • [43] Survey on Multimodal Visual Language Representation Learning
    Du P.-F.
    Li X.-Y.
    Gao Y.-L.
    Ruan Jian Xue Bao/Journal of Software, 2021, 32 (02): : 327 - 348
  • [44] Federated Learning on Multimodal Data: A Comprehensive Survey
    Lin, Yi-Ming
    Gao, Yuan
    Gong, Mao-Guo
    Zhang, Si-Jia
    Zhang, Yuan-Qiao
    Li, Zhi-Yuan
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (04) : 539 - 553
  • [45] A survey on knowledge-enhanced multimodal learning
    Lymperaiou, Maria
    Stamou, Giorgos
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (10)
  • [46] A review of multimodal image matching: Methods and applications
    Jiang, Xingyu
    Ma, Jiayi
    Xiao, Guobao
    Shao, Zhenfeng
    Guo, Xiaojie
    INFORMATION FUSION, 2021, 73 : 22 - 71
  • [47] Multimodal quality model: New methods and applications
    Zhang, Luming
    SIGNAL PROCESSING, 2016, 120 : 669 - 671
  • [48] Multimodal rhetoric and argumentation Applications - genres - methods
    Stoeckl, Hartmut
    Tseronis, Assimakis
    JOURNAL OF ARGUMENTATION IN CONTEXT, 2024, 13 (02) : 167 - 176
  • [49] Imitation Learning: A Survey of Learning Methods
    Hussein, Ahmed
    Gaber, Mohamed Medhat
    Elyan, Eyad
    Jayne, Chrisina
    ACM COMPUTING SURVEYS, 2017, 50 (02)
  • [50] A survey of graph neural networks in various learning paradigms: methods, applications, and challenges
    Lilapati Waikhom
    Ripon Patgiri
    Artificial Intelligence Review, 2023, 56 : 6295 - 6364