Decoupling music notation to improve end-to-end Optical Music Recognition

被引:9
作者
Alfaro-Contreras, Maria [1 ]
Rios-Vila, Antonio [1 ]
Valero-Mas, Jose J. [1 ]
Inesta, Jose M. [1 ]
Calvo-Zaragoza, Jorge [1 ]
机构
[1] Univ Alicante, Inst Univ Invest Informat, Ap 99, E-03080 Alicante, Spain
关键词
Optical music recognition; Deep learning; Connectionist temporal classification; Sequence labeling;
D O I
10.1016/j.patrec.2022.04.032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Inspired by the Text Recognition field, end-to-end schemes based on Convolutional Recurrent Neural Networks (CRNN) trained with the Connectionist Temporal Classification (CTC) loss function are considered one of the current state-of-the-art techniques for staff-level Optical Music Recognition (OMR). Unlike text symbols, music-notation elements may be defined as a combination of (i) a shape primitive located in (ii) a certain position in a staff. However, this double nature is generally neglected in the learning process, as each combination is treated as a single token. In this work, we study whether exploiting such particularity of music notation actually benefits the recognition performance and, if so, which approach is the most appropriate. For that, we thoroughly review existing specific approaches that explore this premise and propose different combinations of them. Furthermore, considering the limitations observed in such approaches, a novel decoding strategy specifically designed for OMR is proposed. The results obtained with four different corpora of historical manuscripts show the relevance of leveraging this double nature of music notation since it outperforms the standard approaches where it is ignored. In addition, the proposed decoding leads to significant reductions in the error rates with respect to the other cases.(c) 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )
引用
收藏
页码:157 / 163
页数:7
相关论文
共 27 条
[1]  
Alfaro-Contreras M., 2021, P 22 INTERNA TIONAL, P35
[2]   Exploiting the Two-Dimensional Nature of Agnostic Music Notation for Neural Optical Music Recognition [J].
Alfaro-Contreras, Maria ;
Valero-Mas, Jose J. .
APPLIED SCIENCES-BASEL, 2021, 11 (08)
[3]   The challenge of optical music recognition [J].
Bainbridge, D ;
Bell, T .
COMPUTERS AND THE HUMANITIES, 2001, 35 (02) :95-121
[4]   Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism [J].
Baro, Arnau ;
Badal, Carles ;
Fornes, Alicia .
2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, :205-210
[5]   Understanding Optical Music Recognition [J].
Calvo-Zaragoza, Jorge ;
Hajic, Jan, Jr. ;
Pacha, Alexander .
ACM COMPUTING SURVEYS, 2020, 53 (04)
[6]   Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks [J].
Calvo-Zaragoza, Jorge ;
Toselli, Alejandro H. ;
Vidal, Enrique .
PATTERN RECOGNITION LETTERS, 2019, 128 :115-121
[7]   End-to-End Incremental Learning [J].
Castro, Francisco M. ;
Marin-Jimenez, Manuel J. ;
Guil, Nicolas ;
Schmid, Cordelia ;
Alahari, Karteek .
COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 :241-257
[8]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[9]  
Graves A, 2006, P INT C MACH LEARN, P369, DOI DOI 10.1145/1143844.1143891
[10]  
Graves A, 2012, STUD COMPUT INTELL, V385, P5