Musical heritage historical entity linking

被引:0
作者
Graciotti, Arianna [1 ]
Lazzari, Nicolas [1 ,2 ]
Presutti, Valentina [1 ]
Tripodi, Rocco [3 ]
机构
[1] Univ Bologna, LILEC, Bologna, Italy
[2] Univ Pisa, Comp Sci Dept, Pisa, Italy
[3] Cafoscari Univ Venice, DAIS, I-30170 Venice, Italy
关键词
Historical documents; Named entity recognition; Named entity classification; Entity linking;
D O I
10.1007/s10462-024-11102-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Linking named entities occurring in text to their corresponding entity in a Knowledge Base (KB) is challenging, especially when dealing with historical texts. In this work, we introduce Musical Heritage named Entities Recognition, Classification and Linking (mhercl), a novel benchmark consisting of manually annotated sentences extrapolated from historical periodicals of the music domain. mhercl contains named entities under-represented or absent in the most famous KBs. We experiment with several State-of-the-Art models on the Entity Linking (EL) task and show that mhercl is a challenging dataset for all of them. We propose a novel unsupervised EL model and a method to extend supervised entity linkers by using Knowledge Graphs (KGs) to tackle the main difficulties posed by historical documents. Our experiments reveal that relying on unsupervised techniques and improving models with logical constraints based on KGs and heuristics to predict NIL entities (entities not represented in the KB of reference) results in better EL performance on historical documents.
引用
收藏
页数:41
相关论文
共 72 条
  • [41] Lacerra C, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P10810
  • [42] Li CM, 2024, AAAI CONF ARTIF INTE, P18471
  • [43] Mallen A, 2023, PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, P9802
  • [44] Menzel S, 2021, Named Entity Linking mit Wikidata und GND-Das Potenzial handkuratierter und strukturierter Datenquellen fur die semantische Anreicherung von Volltexten
  • [45] COPOSITIVE-PLUS LEMKE ALGORITHM SOLVES POLYMATRIX GAMES
    MILLER, DA
    ZUCKER, SW
    [J]. OPERATIONS RESEARCH LETTERS, 1991, 10 (05) : 285 - 290
  • [46] WORDNET - A LEXICAL DATABASE FOR ENGLISH
    MILLER, GA
    [J]. COMMUNICATIONS OF THE ACM, 1995, 38 (11) : 39 - 41
  • [47] Mitchell Alexis, 2005, ACE 2004 MULTILINGUA
  • [48] Survey on English Entity Linking on Wikidata: Datasets and approaches
    Moeller, Cedric
    Lehmann, Jens
    Usbeck, Ricardo
    [J]. SEMANTIC WEB, 2022, 13 (06) : 925 - 966
  • [49] NASH J, 1951, ANN MATH, V54, P286, DOI 10.2307/1969529
  • [50] Orlandi R, 2024, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, P14114