Candidate sentence selection for extractive text summarization

被引:35
作者
Mutlu, Begum [1 ]
Sezer, Ebru A. [2 ]
Akcayol, M. Ali [1 ]
机构
[1] Gazi Univ, Dept Comp Engn, TR-06570 Ankara, Turkey
[2] Hacettepe Univ, Dept Comp Engn, TR-06800 Ankara, Turkey
关键词
Extractive text summarization; Text summarization features; Summarization dataset; Long short-term memory; SCORING TECHNIQUES; MAXIMUM COVERAGE;
D O I
10.1016/j.ipm.2020.102359
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text summarization is a process of generating a brief version of documents by preserving the fundamental information of documents as much as possible. Although most of the text summarization research has been focused on supervised learning solutions, there are a few datasets indeed generated for summarization tasks, and most of the existing summarization datasets do not have human-generated goal summaries which are vital for both summary generation and evaluation. Therefore, a new dataset was presented for abstractive and extractive summarization tasks in this study. This dataset contains academic publications, the abstracts written by the authors, and extracts in two sizes, which were generated by human readers in this research. Then, the resulting extracts were evaluated to ensure the validity of the human extract production process. Moreover, the extractive summarization problem was reinvestigated on the proposed summarization dataset. Here the main point taken into account was to analyze the feature vector to generate more informative summaries. To that end, a comprehensive syntactic feature space was generated for the proposed dataset, and the impact of these features on the informativeness of the resulting summary was investigated. Besides, the summarization capability of semantic features was experienced by using GloVe and word2vec embeddings. Finally, the use of ensembled feature space, which corresponds to the joint use of syntactic and semantic features, was proposed on a long short-term memory-based neural network model. ROUGE metrics evaluated the model summaries, and the results of these evaluations showed that the use of the proposed ensemble feature space remarkably improved the single-use of syntactic or semantic features. Additionally, the resulting summaries of the proposed approach on ensembled features prominently outperformed or provided comparable performance than summaries obtained by state-of-the-art models for extractive summarization.
引用
收藏
页数:18
相关论文
共 70 条
[1]   Multiple documents summarization based on evolutionary optimization algorithm [J].
Alguliev, Rasim M. ;
Aliguliyev, Ramiz M. ;
Isazade, Nijat R. .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (05) :1675-1689
[2]   Sentence selection for generic document summarization using an adaptive differential evolution algorithm [J].
Alguliev, Rasim M. ;
Aliguliyev, Ramiz M. ;
Mehdiyev, Chingiz A. .
SWARM AND EVOLUTIONARY COMPUTATION, 2011, 1 (04) :213-222
[3]   GenDocSum plus MCLR: Generic document summarization based on maximum coverage and less redundancy [J].
Alguliev, Rasim M. ;
Aliguliyev, Ramiz M. ;
Hajirahimova, Makrufa S. .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (16) :12460-12473
[4]   MCMR: Maximum coverage and minimum redundant text summarization model [J].
Alguliev, Rasim M. ;
Aliguliyev, Ramiz M. ;
Hajirahimova, Makrufa S. ;
Mehdiyev, Chingiz A. .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (12) :14514-14522
[5]   COSUM: Text summarization based on clustering and optimization [J].
Alguliyev, Rasim M. ;
Aliguliyev, Ramiz M. ;
Isazade, Nijat R. ;
Abdi, Asad ;
Idris, Norisma .
EXPERT SYSTEMS, 2019, 36 (01)
[6]   An unsupervised approach to generating generic summaries of documents [J].
Alguliyev, Rasim M. ;
Aliguliyev, Ramiz M. ;
Isazade, Nijat R. .
APPLIED SOFT COMPUTING, 2015, 34 :236-250
[7]   A novel partitioning-based clustering method and generic document summarization [J].
Aliguliyev, Ramiz M. .
2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS PROCEEDINGS, 2006, :626-629
[8]  
Allahyari M, 2017, INT J ADV COMPUT SC, V8, P397, DOI 10.14569/IJACSA.2017.081052
[9]   An Approach for Combining Multiple Weighting Schemes and Ranking Methods in Graph-Based Multi-Document Summarization [J].
Alzuhair, Abeer ;
Al-Dhelaan, Mohammed .
IEEE ACCESS, 2019, 7 :120375-120386
[10]  
[Anonymous], 2016, P JOINT WORKSH BIBL