Transformers analyzing poetry: multilingual metrical pattern prediction with transfomer-based language models

被引:11
作者
de la Rosa, Javier [1 ]
Perez, Alvaro [1 ]
de Sisto, Mirella [1 ]
Hernandez, Laura [1 ]
Diaz, Aitor [2 ]
Ros, Salvador [2 ]
Gonzalez-Blanco, Elena [3 ]
机构
[1] UNED, LINHD, Juan del Rosal 16, Madrid 28040, Spain
[2] UNED, Control & Commun Syst, Madrid, Spain
[3] IE Univ, Sch Human Sci & Technol, Madrid, Spain
基金
欧洲研究理事会;
关键词
Natural language processing; Language models; Digital humanities; Poetry; SCANSION; METER;
D O I
10.1007/s00521-021-06692-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The splitting of words into stressed and unstressed syllables is the foundation for the scansion of poetry, a process that aims at determining the metrical pattern of a line of verse within a poem. Intricate language rules and their exceptions, as well as poetic licenses exerted by the authors, make calculating these patterns a nontrivial task. Some rhetorical devices shrink the metrical length, while others might extend it. This opens the door for interpretation and further complicates the creation of automated scansion algorithms useful for automatically analyzing corpora on a distant reading fashion. In this paper, we compare the automated metrical pattern identification systems available for Spanish, English, and German, against fine-tuned monolingual and multilingual language models trained on the same task. Despite being initially conceived as models suitable for semantic tasks, our results suggest that transformers-based models retain enough structural information to perform reasonably well for Spanish on a monolingual setting, and outperforms both for English and German when using a model trained on the three languages, showing evidence of the benefits of cross-lingual transfer between the languages.
引用
收藏
页码:18171 / 18176
页数:6
相关论文
共 35 条
[1]  
Agirrezabal M, 2016, J LANG MODEL, P4
[2]  
Agirrezabal M., 2017, P INT C REC ADV NAT, P18, DOI DOI 10.26615/978-954-452-049-6_003
[3]  
Algee-Hewitt M, 2014, DH
[4]  
[Anonymous], 2011, Littera: Studies in Language and Literature
[5]  
Anttila A, 2016, P ANN M PHON
[6]  
Bobenhausen K, 2015, LANGAGES, P67
[7]  
Bojanowski Piotr, 2017, Transactions of the Association for Computational Linguistics, V5, P135, DOI DOI 10.1162/TACL_A_00051
[8]  
Chan B, 2020, P 28 INT C COMP LING, P6788, DOI DOI 10.18653/V1/2020.COLING-MAIN.598
[9]  
Conneau A, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2475
[10]  
Conneau Alexis, 2019, UNSUPERVISED CROSS L, DOI DOI 10.18653/V1/2020.ACL-MAIN.747