EUSKOR: End-to-end coreference resolution system for Basque

被引:1
作者
Soraluze, Ander [1 ]
Arregi, Olatz [2 ]
Arregi, Xabier [1 ]
Diaz de Ilarraza, Arantza [1 ]
机构
[1] Univ Basque Country, Comp Languages & Syst Dept, Donostia San Sebastian, Spain
[2] Univ Basque Country, Comp Architecture & Technol Dept, Donostia San Sebastian, Spain
来源
PLOS ONE | 2019年 / 14卷 / 09期
关键词
CONSTRUCTION; METHODOLOGY;
D O I
10.1371/journal.pone.0221801
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper describes the process of adapting the Stanford Coreference resolution module to the Basque language, taking into account the characteristics of the language. The module has been integrated in a linguistic analysis pipeline obtaining an end-to-end coreference resolution system for the Basque language. The adaptation process explained can benefit and facilitate other languages with similar characteristics in the implementation of their coreference resolution systems. During the experimentation phase, we have demonstrated that language-specific features have a noteworthy effect on coreference resolution, obtaining a gain in CoNLL score of 7.07 with respect to the baseline system. We have also analysed the effect that preprocessing has in coreference resolution, comparing the results obtained with automatic mentions versus gold mentions. When gold mentions are provided, the results increase 11.5 points in CoNLL score in comparison with results obtained when automatic mentions are used. The contribution of each sieve is analysed concluding that morphology is essential for agglutinative languages to obtain good performance in coreference resolution. Finally, an error analysis of the coreference resolution system is presented which have revealed our system's weak points and help to determine the improvements of the system. As a result of the error analysis, we have enriched the Basque coreference resolution adding new two sieves, obtaining an improvement of 0.24 points in CoNLL F-1 when automatic mentions are used and of 0.39 points when the gold mentions are provided.
引用
收藏
页数:25
相关论文
empty
未找到相关数据