Late multimodal fusion for image and audio music transcription

被引:8
|
作者
Alfaro-Contreras, Maria [1 ]
Valero-Mas, Jose J. [1 ]
Inesta, Jose M. [1 ]
Calvo-Zaragoza, Jorge [1 ]
机构
[1] Univ Alicante, Univ Inst Comp Res, Carretera San Vicente Raspeig S-N, Alicante 03690, Spain
关键词
Optical Music Recognition; Automatic Music Transcription; Multimodality; Deep learning; Connectionist Temporal Classification; Sequence labeling; Word graphs;
D O I
10.1016/j.eswa.2022.119491
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Music transcription, which deals with the conversion of music sources into a structured digital format, is a key problem for Music Information Retrieval (MIR). When addressing this challenge in computational terms, the MIR community follows two lines of research: music documents, which is the case of Optical Music Recognition (OMR), or audio recordings, which is the case of Automatic Music Transcription (AMT). The different nature of the aforementioned input data has conditioned these fields to develop modality-specific frameworks. However, their recent definition in terms of sequence labeling tasks leads to a common output representation, which enables research on a combined paradigm. In this respect, multimodal image and audio music transcription comprises the challenge of effectively combining the information conveyed by image and audio modalities. In this work, we explore this question at a late-fusion level: we study four combination approaches in order to merge, for the first time, the hypotheses regarding end-to-end OMR and AMT systems in a lattice-based search space. The results obtained for a series of performance scenarios-in which the corresponding single-modality models yield different error rates-showed interesting benefits of these approaches. In addition, two of the four strategies considered significantly improve the corresponding unimodal standard recognition frameworks.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] A multimodal image fusion framework applied in radiotherapy
    Riefenstahl, N
    Krell, G
    Calow, R
    Michaelis, B
    Walke, M
    FIFTH INTERNATIONAL CONFERENCE ON INFORMATION VISUALISATION, PROCEEDINGS, 2001, : 173 - 178
  • [42] Probabilistic Fusion and Analysis of Multimodal Image Features
    Kleinschmidt, Sebastian P.
    Wagner, Bernardo
    2017 18TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), 2017, : 498 - 504
  • [43] Regular Constrained Multimodal Fusion for Image Captioning
    Wang, Liya
    Chen, Haipeng
    Liu, Yu
    Lyu, Yingda
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 11900 - 11913
  • [44] Multimodal image fusion with joint sparsity model
    Yin, Haitao
    Li, Shutao
    OPTICAL ENGINEERING, 2011, 50 (06)
  • [45] Convolution analysis operator for multimodal image fusion
    Zhang, Chengfang
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY, 2021, 183 : 603 - 608
  • [46] Histology image search using multimodal fusion
    Caicedo, Juan C.
    Vanegas, Jorge A.
    Paez, Fabian
    Gonzalez, Fabio A.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 51 : 114 - 128
  • [47] Multimodal Image Fusion in Visual Sensor Networks
    Nirmala, D. Egfin
    Vignesh, R. K.
    Vaidehi, V.
    2013 IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTING AND COMMUNICATION TECHNOLOGIES, 2013,
  • [48] A Review of Multimodal Medical Image Fusion Techniques
    Huang, Bing
    Yang, Feng
    Yin, Mengxiao
    Mo, Xiaoying
    Zhong, Cheng
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2020, 2020
  • [49] Image-Text Multimodal Sentiment Analysis Framework of Assamese News Articles Using Late Fusion
    Das, Ringki
    Singh, Thoudam Doren
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [50] Multimodal Attentive Fusion Network for audio-visual event recognition
    Brousmiche, Mathilde
    Rouat, Jean
    Dupont, Stephane
    INFORMATION FUSION, 2022, 85 : 52 - 59