Late multimodal fusion for image and audio music transcription

被引:8
|
作者
Alfaro-Contreras, Maria [1 ]
Valero-Mas, Jose J. [1 ]
Inesta, Jose M. [1 ]
Calvo-Zaragoza, Jorge [1 ]
机构
[1] Univ Alicante, Univ Inst Comp Res, Carretera San Vicente Raspeig S-N, Alicante 03690, Spain
关键词
Optical Music Recognition; Automatic Music Transcription; Multimodality; Deep learning; Connectionist Temporal Classification; Sequence labeling; Word graphs;
D O I
10.1016/j.eswa.2022.119491
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Music transcription, which deals with the conversion of music sources into a structured digital format, is a key problem for Music Information Retrieval (MIR). When addressing this challenge in computational terms, the MIR community follows two lines of research: music documents, which is the case of Optical Music Recognition (OMR), or audio recordings, which is the case of Automatic Music Transcription (AMT). The different nature of the aforementioned input data has conditioned these fields to develop modality-specific frameworks. However, their recent definition in terms of sequence labeling tasks leads to a common output representation, which enables research on a combined paradigm. In this respect, multimodal image and audio music transcription comprises the challenge of effectively combining the information conveyed by image and audio modalities. In this work, we explore this question at a late-fusion level: we study four combination approaches in order to merge, for the first time, the hypotheses regarding end-to-end OMR and AMT systems in a lattice-based search space. The results obtained for a series of performance scenarios-in which the corresponding single-modality models yield different error rates-showed interesting benefits of these approaches. In addition, two of the four strategies considered significantly improve the corresponding unimodal standard recognition frameworks.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Multimodal Structure Segmentation and Analysis of Music Using Audio and Textual Information
    Cheng, Heng-Tze
    Yang, Yi-Hsuan
    Lin, Yu-Ching
    Chen, Homer H.
    ISCAS: 2009 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-5, 2009, : 1677 - 1680
  • [32] Multimodal Music and Lyrics Fusion Classifier for Artist Identification
    Aryafar, Kamelia
    Shokoufandeh, Ali
    2014 13TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2014, : 506 - 509
  • [33] Lyrics-based audio retrieval and multimodal navigation in music collections
    Mueller, Meinard
    Kurth, Frank
    Damm, David
    Fremerey, Christian
    Clausen, Michael
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, PROCEEDINGS, 2007, 4675 : 112 - +
  • [34] A Multimodal Fusion Online Music Education System for Universities
    Liu, Peng
    Cao, Yixiao
    Wang, Lei
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [35] MAiVAR: Multimodal Audio-Image and Video Action Recognizer
    Shaikh, Muhammad Bilal
    Chai, Douglas
    Islam, Syed Mohammed Shamsul
    Akhtar, Naveed
    2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
  • [36] Multimodal deep fusion for image question answering
    Zhang, Weifeng
    Yu, Jing
    Wang, Yuxia
    Wang, Wei
    KNOWLEDGE-BASED SYSTEMS, 2021, 212
  • [37] Ornament Image Retrieval Using Multimodal Fusion
    Islam S.M.
    Joardar S.
    Dogra D.P.
    Sekh A.A.
    SN Computer Science, 2021, 2 (4)
  • [38] A novel approach for multimodal medical image fusion
    Liu, Zhaodong
    Yin, Hongpeng
    Chai, Yi
    Yang, Simon X.
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (16) : 7425 - 7435
  • [39] Multimodal Image Fusion Method Based on Multiscale Image Matting
    Maqsood, Sarmad
    Damasevicius, Robertas
    Silka, Jakub
    Wozniak, Marcin
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2021), PT II, 2021, 12855 : 57 - 68
  • [40] Laplacian Redecomposition for Multimodal Medical Image Fusion
    Li, Xiaoxiao
    Guo, Xiaopeng
    Han, Pengfei
    Wang, Xiang
    Li, Huaguang
    Luo, Tao
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2020, 69 (09) : 6880 - 6890