Coreference Resolution in Research Papers from Multiple Domains

被引:10
作者
Brack, Arthur [1 ]
Mueller, Daniel Uwe [2 ]
Hoppe, Anett [1 ]
Ewerth, Ralph [1 ,2 ]
机构
[1] TIB Leibniz Informat Ctr Sci & Technol, Res Grp Visual Analyt, Hannover, Germany
[2] Leibniz Univ Hannover, L3S Res Ctr, Hannover, Germany
来源
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2021, PT I | 2021年 / 12656卷
关键词
Coreference resolution; Information extraction; Knowledge graph population; Scholarly communication;
D O I
10.1007/978-3-030-72113-8_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Coreference resolution is essential for automatic text understanding to facilitate high-level information retrieval tasks such as text summarisation or question answering. Previous work indicates that the performance of state-of-the-art approaches (e.g. based on BERT) noticeably declines when applied to scientific papers. In this paper, we investigate the task of coreference resolution in research papers and subsequent knowledge graph population. We present the following contributions: (1) We annotate a corpus for coreference resolution that comprises 10 different scientific disciplines from Science, Technology, and Medicine (STM); (2) We propose transfer learning for automatic coreference resolution in research papers; (3) We analyse the impact of coreference resolution on knowledge graph (KG) population; (4) We release a research KG that is automatically populated from 55,485 papers in 10 STM domains. Comprehensive experiments show the usefulness of the proposed approach. Our transfer learning approach considerably outperforms state-of-the-art baselines on our corpus with an F1 score of 61.4 (+11.0), while the evaluation against a gold standard KG shows that coreference resolution improves the quality of the populated KG significantly with an F1 score of 63.5 (+21.8).
引用
收藏
页码:79 / 97
页数:19
相关论文
共 50 条
[1]  
[Anonymous], 1998, 7 MESS UND C P C HEL
[2]  
[Anonymous], 2012, Proceedings of COLING 2012: Posters
[3]  
Augenstein I., P 11 INT WORKSH SEM
[4]  
Bagga A., 1998, P 1 INT C LANG RES E, V1, P563
[5]  
Beltagy I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3615
[6]   Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references [J].
Bornmann, Lutz ;
Mutz, Ruediger .
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2015, 66 (11) :2215-2222
[7]   Domain-Independent Extraction of Scientific Concepts from Research Articles [J].
Brack, Arthur ;
D'Souza, Jennifer ;
Hoppe, Anett ;
Auer, Soeren ;
Ewerth, Ralph .
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2020, PT I, 2020, 12035 :251-266
[8]  
Chaimongkol P, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3187
[9]  
Chambers A., 2013, Ph.D. thesis
[10]  
Clark K, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P1405