Scientific document summarization via citation contextualization and scientific discourse

被引:47
作者
Cohan, Arman [1 ]
Goharian, Nazli [1 ]
机构
[1] Georgetown Univ, Informat Retrieval Lab, Dept Comp Sci, Washington, DC 20057 USA
关键词
Scientific document summarization; Text summarization; Citation analysis; Natural language processing;
D O I
10.1007/s00799-017-0216-8
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
The rapid growth of scientific literature has made it difficult for the researchers to quickly learn about the developments in their respective fields. Scientific summarization addresses this challenge by providing summaries of the important contributions of scientific papers. We present a framework for scientific summarization which takes advantage of the citations and the scientific discourse structure. Citation texts often lack the evidence and context to support the content of the cited paper and are even sometimes inaccurate. We first address the problem of inaccuracy of the citation texts by finding the relevant context from the cited paper. We propose three approaches for contextualizing citations which are based on query reformulation, word embeddings, and supervised learning. We then train a model to identify the discourse facets for each citation. We finally propose a method for summarizing scientific papers by leveraging the faceted citations and their corresponding contexts. We evaluate our proposed method on two scientific summarization datasets in the biomedical and computational linguistics domains. Extensive evaluation results show that our methods can improve over the state of the art by large margins.
引用
收藏
页码:287 / 303
页数:17
相关论文
共 78 条
[1]  
Abu-Jbara A., 2011, P 49 ANN M ASS COMP, P500
[2]  
Abu-Jbara A., 2013, P 2013 C N AM CHAPTE, P596
[3]  
Abu-Jbara A, 2012, P 2012 C N AM CHAPT, P80
[4]   On the composition of scientific abstracts [J].
Atanassova, Iana ;
Bertin, Marc ;
Lariviere, Vincent .
JOURNAL OF DOCUMENTATION, 2016, 72 (04) :636-647
[5]  
Bendersky M., P 31 ANN INT ACM SIG, P491, DOI DOI 10.1145/1390334.1390419
[6]  
Bengio Y, 2001, ADV NEUR IN, V13, P932
[7]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[8]  
Berg-Kirkpatrick T., 2011, P ANN M ASS COMP LIN
[9]   The invariant distribution of references in scientific articles [J].
Bertin, Marc ;
Atanassova, Iana ;
Gingras, Yves ;
Lariviere, Vincent .
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (01) :164-177
[10]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270