Intelligent multi-document summarization for biomedical literature by word embeddings and graph-based ranking

被引:0
作者
Shen, Chen [1 ]
Lin, Hongfei [1 ]
Hao, Huihui [1 ]
Yang, Zhihao [1 ,2 ]
Wang, Jian [1 ]
Zhang, Shaowu [1 ]
机构
[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian, Peoples R China
[2] Univ New South Wales Canberra, Sch Engn & Informat Technol, Canberra, ACT, Australia
基金
中国国家自然科学基金;
关键词
Intelligent; text summarization; graph-based ranking; similarity calculation;
D O I
10.3233/JIFS-179315
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of clinical and laboratory medicine, the field of bioinformatics boasts of extensive clinical records and research literature. Retrieving effective information from this huge data has become a challenging task. Hence, Intelligent text summarization, which enables users to find and understand relevant source texts more quickly and effortlessly, becomes a very significant and valuable field of research. In this study, we propose an improved TextRank algorithm with weight calculation based on sentence graph to solve this problem. For the experimental dataset obtained from Pubmed, we represent terms as vectors by using Skip-gram model. We design three methods which utilize word embeddings to calculate weights between sentences. Then we build an undirected graph with sentences as nodes. At last, we use the improved TextRank algorithm to calculate the importance of sentences and further generated summarizations base on its ranking. The experimental results and analysis on the datasets demonstrate the effectiveness of the proposed model.
引用
收藏
页码:4797 / 4802
页数:6
相关论文
共 16 条
[1]  
[Anonymous], 2016, ACM SIGIR FORUM, V49, P148
[2]   Rhetorics-based multi-document summarization [J].
Atkinson, John ;
Munoz, Ricardo .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (11) :4346-4352
[3]   Ranking Through Clustering: An Integrated Approach to Multi-Document Summarization [J].
Cai, Xiaoyan ;
Li, Wenjie .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (07) :1424-1433
[4]  
Lian Z., 2015, SCI TECH INFORM DEV, V2, P145
[5]   THE AUTOMATIC CREATION OF LITERATURE ABSTRACTS [J].
LUHN, HP .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1958, 2 (02) :159-165
[6]   Exploiting relevance, coverage, and novelty for query-focused multi-document summarization [J].
Luo, Wenjuan ;
Zhuang, Fuzhen ;
He, Qing ;
Shi, Zhongzhi .
KNOWLEDGE-BASED SYSTEMS, 2013, 46 :33-42
[7]  
Martschat S., 2017, Short Papers, V2, P285
[8]   Exploring events and distributed representations of text in multi-document summarization [J].
Marujo, Luis ;
Ling, Wang ;
Ribeiro, Ricardo ;
Gershman, Anatole ;
Carbonell, Jaime ;
de Matos, David Martins ;
Neto, Joao P. .
KNOWLEDGE-BASED SYSTEMS, 2016, 94 :33-42
[9]   Sports medicine clinical trial research publications in academic medical journals between 1996 and 2005: an audit of the PubMed MEDLINE database [J].
Nichols, A. W. .
BRITISH JOURNAL OF SPORTS MEDICINE, 2008, 42 (11) :909-921
[10]   Fast-mRMR: Fast Minimum Redundancy Maximum Relevance Algorithm for High-Dimensional Big Data [J].
Ramirez-Gallego, Sergio ;
Lastra, Iago ;
Martinez-Rego, David ;
Bolon-Canedo, Veronica ;
Manuel Benitez, Jose ;
Herrera, Francisco ;
Alonso-Betanzos, Amparo .
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2017, 32 (02) :134-152