Text Summarization towards Scientific Information Extraction

被引:0
作者
Keller, Abigail [1 ]
Furst, Jacob [1 ]
Raicu, Daniela [1 ]
Hastings, Peter [1 ]
Tchoua, Roselyne [1 ]
机构
[1] DePaul Univ, Sch Comp, Chicago, IL 60604 USA
来源
2022 IEEE 18TH INTERNATIONAL CONFERENCE ON E-SCIENCE (ESCIENCE 2022) | 2022年
关键词
text summarization; information extraction; data wbeling; rewtions extraction; scientificfacts; polymers;
D O I
10.1109/eScience55777.2022.00036
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Despite the exponential growth in scientific textual content, publications remain the primary means of disseminating vital research to experts within their respective fields. These texts are predominantly written for human consumption, resulting in two fundamental challenges; experts cannot efficiently remain well-informed to leverage the latest discoveries, and applications which rely on valuable insights buried in these texts cannot effectively build upon published results. Consequently, scientific progress stalls. Automatic Text Summarization (ATS) and Information Extraction (IE) are two essential fields which address this problem. While the two research topics are often studied independently, this work proposes to look at ATS in the context of IE, specifically as it relates to Scientific IE. However, Scientific Information Extraction faces several challenges; chiefly, the scarcity of relevant entities and insufficient training data. In this paper, we focus on extractive ATS, which identifies the most valuable sentences from textual content for the purpose of ultimately extracting scientific relations. We account for the associated challenges by means of an ensemble method through the integration of three weakly supervised learning models, one for each entity of the target relation. Notably, while the relation is well defined, we do not require previously annotated data for the entities composing the relation. The central objective is to generate balanced training data, which many advanced natural language processing models require. We apply this idea in the domain of materials science, extracting the polymer-glass transition temperature relation and achieve 94.7% recall (i.e., sentences which contain relations annotated by humans), while reducing the text by 99.3% of the original document.
引用
收藏
页码:225 / 235
页数:11
相关论文
共 43 条
[1]  
Allahyari M, 2017, Arxiv, DOI arXiv:1707.02268
[2]  
[Anonymous], 2013, ADV NEURAL INF PROCE
[3]  
[Anonymous], 2008, P ACL 08 HLT
[4]  
[Anonymous], 2012, P 2012 WORKSHOP BIOM
[5]   Polymer Informatics: Opportunities and Challenges [J].
Audus, Debra J. ;
de Pablo, Juan J. .
ACS MACRO LETTERS, 2017, 6 (10) :1078-1082
[6]  
Beltagy I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3615
[7]   Scientific document summarization via citation contextualization and scientific discourse [J].
Cohan, Arman ;
Goharian, Nazli .
INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2018, 19 (2-3) :287-303
[8]   New frontiers for the materials genome initiative [J].
de Pablo, Juan J. ;
Jackson, Nicholas E. ;
Webb, Michael A. ;
Chen, Long-Qing ;
Moore, Joel E. ;
Morgan, Dane ;
Jacobs, Ryan ;
Pollock, Tresa ;
Schlom, Darrell G. ;
Toberer, Eric S. ;
Analytis, James ;
Dabo, Ismaila ;
DeLongchamp, Dean M. ;
Fiete, Gregory A. ;
Grason, Gregory M. ;
Hautier, Geoffroy ;
Mo, Yifei ;
Rajan, Krishna ;
Reed, Evan J. ;
Rodriguez, Efrain ;
Stevanovic, Vladan ;
Suntivich, Jin ;
Thornton, Katsuyo ;
Zhao, Ji-Cheng .
NPJ COMPUTATIONAL MATERIALS, 2019, 5 (1)
[9]  
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[10]  
Dhillon Jasleen, 2019, 2019 International Conference on Signal Processing and Communication (ICSC), P41