Abstractive text summarization using deep learning with a new Turkish summarization benchmark dataset

被引:4
作者
Ertam, Fatih [1 ]
Aydin, Galip [2 ]
机构
[1] Firat Univ, Technol Fac, Dept Digital Forens Engn, Elazig, Turkey
[2] Firat Univ, Engn Fac, Dept Comp Engn, Elazig, Turkey
关键词
abstract summarization; deep learning; information retrieval; text summarization; web scraping; FRAMEWORK; MODELS;
D O I
10.1002/cpe.6482
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Exponential increase in the amount of textual data made available on the Internet results in new challenges in terms of accessing information accurately and quickly. Text summarization can be defined as reducing the dimensions of the expressions to be summarized without spoiling the meaning. Summarization can be performed as extractive and abstractive or using both together. In this study, we focus on abstractive summarization which can produce more human-like summarization results. For the study we created a Turkish news summarization benchmark dataset from various news agency web portals by crawling the news title, short news, news content, and keywords for the last 5 years. The dataset is made publicly available for researchers. The deep learning network training was carried out by using the news headlines and short news contents from the prepared dataset and then the network was expected to create the news headline as the short news summary. To evaluate the performance of this study, Rouge-1, Rouge-2, and Rouge-L were compared using precision, sensitivity and F1 measure scores. Performance values for the study were presented for each sentence as well as by averaging the results for 50 randomly selected sentences. The F1 Measure values are 0.4317, 0.2194, and 0.4334 for Rouge-1, Rouge-2, and Rouge-L respectively. Performance results show that the approach is promising for Turkish text summarization studies and the prepared dataset will add value to the literature.
引用
收藏
页数:10
相关论文
共 46 条
  • [1] Arabic Single-Document Text Summarization Using Particle Swarm Optimization Algorithm
    Al-Abdallah, Raed Z.
    Al-Taani, Ahmad T.
    [J]. ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2017), 2017, 117 : 30 - 37
  • [2] Automatic ontology-based knowledge extraction from web documents
    Alani, H
    Kim, S
    Millard, DE
    Weal, MJ
    Hall, W
    Lewis, PH
    Shadbolt, NR
    [J]. IEEE INTELLIGENT SYSTEMS, 2003, 18 (01) : 14 - 21
  • [3] Alpaydin E, 2014, ADAPT COMPUT MACH LE, P115
  • [4] Aref M., P 2012 7 INT C COMP
  • [5] Bandyopadhyay, 2018, SOFT COMPUTING THEOR
  • [6] Improving Transformer with Sequential Context Representations for Abstractive Text Summarization
    Cai, Tian
    Shen, Mengjun
    Peng, Huailiang
    Jiang, Lei
    Dai, Qiong
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 512 - 524
  • [7] Cao ZQ, 2018, AAAI CONF ARTIF INTE, P4784
  • [8] An Information Distillation Framework for Extractive Summarization
    Chen, Kuan-Yu
    Liu, Shih-Hung
    Chen, Berlin
    Wang, Hsin-Min
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) : 161 - 170
  • [9] Cheng JP, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P484
  • [10] Automatic text summarization: A comprehensive survey
    El-Kassas, Wafaa S.
    Salama, Cherif R.
    Rafea, Ahmed A.
    Mohamed, Hoda K.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 165