A categorical analysis of coreference resolution errors in biomedical texts

被引:6
作者
Choi, Miji [1 ,2 ]
Zobel, Justin [1 ]
Verspoor, Karin [1 ]
机构
[1] Univ Melbourne, Dept Comp & Informat Syst, Melbourne, Vic, Australia
[2] Natl ICT Australia NICTA, Victoria Res Lab, Sydney, NSW, Australia
基金
澳大利亚研究理事会;
关键词
Coreference resolution; Natural language processing; Text mining; Error analysis; EVENT EXTRACTION; NETWORK;
D O I
10.1016/j.jbi.2016.02.015
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Coreference resolution is an essential task in information extraction from the published biomedical literature. It supports the discovery of complex information by linking referring expressions such as pronouns and appositives to their referents, which are typically entities that play a central role in biomedical events. Correctly establishing these links allows detailed understanding of all the participants in events, and connecting events together through their shared participants. Results: As an initial step towards the development of a novel coreference resolution system for the biomedical domain, we have categorised the characteristics of coreference relations by type of anaphor as well as broader syntactic and semantic characteristics, and have compared the performance of a domain adaptation of a state-of-the-art general system to published results from domain-specific systems in terms of this categorisation. We also develop a rule-based system for anaphoric coreference resolution in the biomedical domain with simple modules derived from available systems. Our results show that the domain-specific systems outperform the general system overall. Whilst this result is unsurprising, our proposed categorisation enables a detailed quantitative analysis of the system performance. We identify limitations of each system and find that there remain important gaps in the state-of-the-art systems, which are clearly identifiable with respect to the categorisation. Conclusion: We have analysed in detail the performance of existing coreference resolution systems for the biomedical literature and have demonstrated that there clear gaps in their coverage. The approach developed in the general domain needs to be tailored for portability to the biomedical domain. The specific framework for class-based error analysis of existing systems that we propose has benefits for identifying specific limitations of those systems. This in turn provides insights for further system development. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:309 / 318
页数:10
相关论文
共 48 条
  • [31] The Stanford CoreNLP Natural Language Processing Toolkit
    Manning, Christopher D.
    Surdeanu, Mihai
    Bauer, John
    Finkel, Jenny
    Bethard, Steven J.
    McClosky, David
    [J]. PROCEEDINGS OF 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: SYSTEM DEMONSTRATIONS, 2014, : 55 - 60
  • [32] Boosting automatic event extraction from the literature using domain adaptation and coreference resolution
    Miwa, Makoto
    Thompson, Paul
    Ananiadou, Sophia
    [J]. BIOINFORMATICS, 2012, 28 (13) : 1759 - 1765
  • [33] Miwa Makoto, 2010, Journal of Bioinformatics and Computational Biology, V8, P131, DOI 10.1142/S0219720010004586
  • [34] Miyao Yusuke., 2005, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, P83
  • [35] Ng V, 2002, 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P104
  • [36] Improving protein coreference resolution by simple semantic classification
    Ngan Nguyen
    Kim, Jin-Dong
    Miwa, Makoto
    Matsuzaki, Takuya
    Tsujii, Junichi
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [37] Nguyen NLT, 2011, P BIONLP SHAR TASK 2, P74
  • [38] Pradhan Sameer., 2011, Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, P1
  • [39] Ravikumar KE, 2014, BIOCOMPUT-PAC SYM, P352
  • [40] The zone of comfort: Predicting visual discomfort with stereo displays
    Shibata, Takashi
    Kim, Joohwan
    Hoffman, David M.
    Banks, Martin S.
    [J]. JOURNAL OF VISION, 2011, 11 (08):