On the Use of Transformers for End-to-End Optical Music Recognition

被引:9
作者
Rios-Vila, Antonio [1 ]
Inesta, Jose M. [1 ]
Calvo-Zaragoza, Jorge [1 ]
机构
[1] Univ Alicante, UI Comp Res, Alicante, Spain
来源
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022) | 2022年 / 13256卷
关键词
Optical Music Recognition; Transformers; Connectionist Temporal Classification; Image-to-sequence;
D O I
10.1007/978-3-031-04881-4_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art end-to-end Optical Music Recognition (OMR) systems use Recurrent Neural Networks to produce music transcriptions, as these models retrieve a sequence of symbols from an input staff image. However, recent advances in Deep Learning have led other research fields that process sequential data to use a new neural architecture: the Transformer, whose popularity has increased over time. In this paper, we study the application of the Transformer model to the end-to-end OMR systems. We produced several models based on all the existing approaches in this field and tested them on various corpora with different types of encodings for the output. The obtained results allow us to make an in-depth analysis of the advantages and disadvantages of applying this architecture to these systems. This discussion leads us to conclude that Transformers, as they were conceived, do not seem to be appropriate to perform end-to-end OMR, so this paper raises interesting lines of future research to get the full potential of this architecture in this field.
引用
收藏
页码:470 / 481
页数:12
相关论文
共 20 条
  • [1] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [2] Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism
    Baro, Arnau
    Badal, Carles
    Fornes, Alicia
    [J]. 2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 205 - 210
  • [3] Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images
    Byrd, Donald
    Simonsen, Jakob Grue
    [J]. JOURNAL OF NEW MUSIC RESEARCH, 2015, 44 (03) : 169 - 195
  • [4] Calvo-Zaragoza J., 2018, P 19 INT SOC MUS INF, P248, DOI 10.5281/zenodo.1492395
  • [5] Understanding Optical Music Recognition
    Calvo-Zaragoza, Jorge
    Hajic, Jan, Jr.
    Pacha, Alexander
    [J]. ACM COMPUTING SURVEYS, 2020, 53 (04)
  • [6] Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks
    Calvo-Zaragoza, Jorge
    Toselli, Alejandro H.
    Vidal, Enrique
    [J]. PATTERN RECOGNITION LETTERS, 2019, 128 : 115 - 121
  • [7] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [8] Dosovitskiy A., 2020, P INT C LEARN REPR, P1
  • [9] GPT-3: Its Nature, Scope, Limits, and Consequences
    Floridi, Luciano
    Chiriatti, Massimo
    [J]. MINDS AND MACHINES, 2020, 30 (04) : 681 - 694
  • [10] Graves A., 2006, MACHINE LEARNING P 2, P369, DOI [DOI 10.1145/1143844.1143891, 10.1145/1143844.1143891]