Neural Coreference Resolution for Slovene Language

被引:2
作者
Klemen, Matej [1 ]
Zitnik, Slavko [1 ]
机构
[1] Univ Ljubljana, Fac Comp & Informat Sci, Vecna Pot 113, Ljubljana 1000, Slovenia
关键词
coreference resolution; Slovene language; neural networks; word embeddings;
D O I
10.2298/CSIS201120060K
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Coreference resolution systems aim to recognize and cluster together mentions of the same underlying entity. While there exist large amounts of research on broadly spoken languages such as English and Chinese, research on coreference in other languages is comparably scarce. In this work we first present SentiCoref 1.0 - a coreference resolution dataset for Slovene language that is comparable to English-based corpora. Further, we conduct a series of analyses using various complex models that range from simple linear models to current state-of-the-art deep neural coreference approaches leveraging pre-trained contextual embeddings. Apart from SentiCoref, we evaluate models also on a smaller coref149 Slovene dataset to justify the creation of a new corpus. We investigate robustness of the models using cross-domain data and data augmentations. Models using contextual embeddings achieve the best results - up to 0.92 average F-1 score for the SentiCoref dataset. Cross-domain experiments indicate that SentiCoref allows the models to learn more general patterns, which enables them to outperform models, learned on coref149 only.
引用
收藏
页码:495 / 521
页数:27
相关论文
共 49 条
  • [1] Al-Rfou Rami, 2020, POLYGLOT
  • [2] [Anonymous], P WORKSH C RES ONTON
  • [3] [Anonymous], 1997, P 7 C MESS UND
  • [4] [Anonymous], 1991, P 3 C MESS UND
  • [5] Attree S, 2019, GENDER BIAS IN NATURAL LANGUAGE PROCESSING (GEBNLP 2019), P134
  • [6] Bagga A., 1998, PROC 1 LANGUAGE RESO, P563
  • [7] Bojanowski P., 2017, Trans. Assoc. Comput. Linguistics, V5, P135, DOI [DOI 10.1162/TACLA00051, 10.1162/tacl_a_00051, DOI 10.1162/TACL_A_00051]
  • [8] Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339
  • [9] Bucar J., 2017, Manually Sentiment Annotated Slovenian News Corpus SentiNews 1.0. Slovenian Language Resource Repository CLARIN.SI
  • [10] Chinchor N., 1993, P 5 C MESSAGE UNDERS, P69