Exploring Abstractive Text Summarization: Methods, Dataset, Evaluation, and Emerging Challenges

被引:0
作者
Sunusi, Yusuf [1 ]
Omar, Nazlia [1 ]
Zakaria, Lailatul Qadri [1 ]
机构
[1] Univ Kebangsaan Malaysia, Ctr Artificial Intelligence Technol, Bangi 43600, Malaysia
关键词
Abstractive text summarization; systematic literature review; natural language processing; evaluation metrics; dataset; computation linguistics; MODEL; RNN;
D O I
10.14569/IJACSA.2024.01507130
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
-The latest advanced models for abstractive summarization, which utilize encoder-decoder frameworks, produce exactly one summary for each source text. This systematic literature review (SLR) comprehensively examines the recent advancements in abstractive text summarization (ATS), a pivotal area in natural language processing (NLP) that aims to generate concise and coherent summaries from extensive text sources. We delve into the evolution of ATS, focusing on key aspects such as encoder-decoder architectures, innovative mechanisms like attention and pointer-generator models, training and optimization methods, datasets, and evaluation metrics. Our review analyzes a wide range of studies, highlighting the transition from traditional sequence-to-sequence models to more advanced approaches like Transformer-based architectures. We explore the integration of mechanisms such as attention, which enhances model interpretability and effectiveness, and pointer-generator networks, which adeptly balance between copying and generating text. The review also addresses the challenges in training these models, including issues related to dataset quality and diversity, particularly in low-resource languages. A critical analysis of evaluation metrics reveals a heavy reliance on ROUGE scores, prompting a discussion on the need for more nuanced evaluation methods that align closely with human judgment. Additionally, we identify and discuss emerging research gaps, such as the need for effective summary length control and the handling of model hallucination, which are crucial for the practical application of ATS. This SLR not only synthesizes current research trends and methodologies in ATS, but also provides insights into future directions, underscoring the importance of continuous innovation in model development, dataset enhancement, and evaluation strategies. Our findings aim to guide researchers and practitioners in navigating the evolving landscape of abstractive text summarization and in identifying areas ripe for future exploration and development.
引用
收藏
页码:1340 / 1357
页数:18
相关论文
共 84 条
  • [1] Abdelwahab M. Y., 2023, AsiaPacific Journal of Information Technology and Multimedia, V12, P70, DOI [10.17576/apjitm-2023-1201-05, DOI 10.17576/APJITM-2023-1201-05]
  • [2] Adams G, 2023, PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, P10520, DOI 10.18653/v1/2023.acl-long.587
  • [3] Afzal A., 2023, P 15 INT C AG ART IN, P682, DOI [10.5220/0011744500003393, DOI 10.5220/0011744500003393]
  • [4] Ajie Prasetya, 2022, arXiv
  • [5] Algehed M, 2020, Arxiv, DOI arXiv:2005.12345
  • [6] Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification
    Banerjee, Imon
    Ling, Yuan
    Chen, Matthew C.
    Hasan, Sadid A.
    Langlotz, Curtis P.
    Moradzadeh, Nathaniel
    Chapman, Brian
    Amrhein, Timothy
    Mong, David
    Rubin, Daniel L.
    Farri, Oladimeji
    Lungren, Matthew P.
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2019, 97 : 79 - 88
  • [7] Banerjee S., 2005, P ACL WORKSH INTR EX, P65
  • [8] Turkish abstractive text summarization using pretrained sequence-to-sequence models
    Baykara, Batuhan
    Gungor, Tunga
    [J]. NATURAL LANGUAGE ENGINEERING, 2023, 29 (05) : 1275 - 1304
  • [9] bin Rodzman SB, 2019, 2019 IEEE 9TH SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE), P299, DOI [10.1109/iscaie.2019.8743988, 10.1109/ISCAIE.2019.8743988]
  • [10] Transformer-Based Abstractive Summarization for Reddit and Twitter: Single Posts vs. Comment Pools in Three Languages
    Blekanov, Ivan S.
    Tarasov, Nikita
    Bodrunova, Svetlana S.
    [J]. FUTURE INTERNET, 2022, 14 (03)