On the Use of Transformers for End-to-End Optical Music Recognition

被引：9

作者：

Rios-Vila, Antonio ^{[1
]}

Inesta, Jose M. ^{[1
]}

Calvo-Zaragoza, Jorge ^{[1
]}

机构：

[1] Univ Alicante, UI Comp Res, Alicante, Spain

来源：

PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022) | 2022年 / 13256卷

关键词：

Optical Music Recognition; Transformers; Connectionist Temporal Classification; Image-to-sequence;

D O I：

10.1007/978-3-031-04881-4_37

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

State-of-the-art end-to-end Optical Music Recognition (OMR) systems use Recurrent Neural Networks to produce music transcriptions, as these models retrieve a sequence of symbols from an input staff image. However, recent advances in Deep Learning have led other research fields that process sequential data to use a new neural architecture: the Transformer, whose popularity has increased over time. In this paper, we study the application of the Transformer model to the end-to-end OMR systems. We produced several models based on all the existing approaches in this field and tested them on various corpora with different types of encodings for the output. The obtained results allow us to make an in-depth analysis of the advantages and disadvantages of applying this architecture to these systems. This discussion leads us to conclude that Transformers, as they were conceived, do not seem to be appropriate to perform end-to-end OMR, so this paper raises interesting lines of future research to get the full potential of this architecture in this field.

引用

页码：470 / 481

页数：12

共 20 条

[1] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[2] Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism
Baro, Arnau
Badal, Carles
Fornes, Alicia
[J]. 2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 205 - 210
[3] Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images
Byrd, Donald
Simonsen, Jakob Grue
[J]. JOURNAL OF NEW MUSIC RESEARCH, 2015, 44 (03) : 169 - 195
[4] Calvo-Zaragoza J., 2018, P 19 INT SOC MUS INF, P248, DOI 10.5281/zenodo.1492395
[5] Understanding Optical Music Recognition
Calvo-Zaragoza, Jorge
Hajic, Jan, Jr.
Pacha, Alexander
[J]. ACM COMPUTING SURVEYS, 2020, 53 (04)
[6] Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks
Calvo-Zaragoza, Jorge
Toselli, Alejandro H.
Vidal, Enrique
[J]. PATTERN RECOGNITION LETTERS, 2019, 128 : 115 - 121
[7] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8] Dosovitskiy A., 2020, P INT C LEARN REPR, P1
[9] GPT-3: Its Nature, Scope, Limits, and Consequences
Floridi, Luciano
Chiriatti, Massimo
[J]. MINDS AND MACHINES, 2020, 30 (04) : 681 - 694
[10] Graves A., 2006, MACHINE LEARNING P 2, P369, DOI [DOI 10.1145/1143844.1143891, 10.1145/1143844.1143891]

← 1 2 →