Using Data Augmentation for Improving Text Summarization

被引:0
作者
Constantin, Daniel [1 ]
Mihaescu, Marian Cristian [1 ]
Heras, Stella [2 ]
Jordan, Jaume [2 ]
Palanca, Javier [2 ]
Julian, Vicente [2 ]
机构
[1] Univ Craiova, Craiova, Romania
[2] VRAIN Univ Politecn Valencia, Valencian Res Inst Artificial Intelligence, Valencia, Spain
来源
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2024, PT II | 2025年 / 15347卷
关键词
Text Summarization; Data Augmentation; Transformer Models; ROUGE;
D O I
10.1007/978-3-031-77738-7_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In today's society, the amount of information we need to process daily from sources such as news, videos, and literature is relatively high. The primary strategy to decrease the workload is to use effective summarization techniques, either through extractive (where the summary is made up of extracts from the source itself) or abstractive methods. Traditional summarization models often rely on extensive humanannotated data, which is usually quite costly. This research proposes an approach leveraging transformer models to optimize and affordably augment small datasets, enhancing the performance of summarization models. Using sentence clustering and pre-trained models on tasks such as summarization or paraphrasing, we explore whether such an approach can yield better results across various summarization datasets that target different formats, such as video conference transcripts and news articles.
引用
收藏
页码:132 / 144
页数:13
相关论文
共 10 条
[1]  
Brown TB, 2020, ADV NEUR IN, V33
[2]  
Chin-Yew L., 2004, 2004 P WORKSH TEXT S
[3]  
Dai HX, 2023, Arxiv, DOI [arXiv:2302.13007, DOI 10.48550/ARXIV.2302.13007]
[4]  
Lev G, 2019, Arxiv, DOI arXiv:1906.01351
[5]  
Liu Yang, 2019, arXiv
[6]  
Loem M, 2022, Arxiv, DOI arXiv:2201.05313
[7]  
Lv TC, 2021, Arxiv, DOI arXiv:2106.05606
[8]   The Stanford CoreNLP Natural Language Processing Toolkit [J].
Manning, Christopher D. ;
Surdeanu, Mihai ;
Bauer, John ;
Finkel, Jenny ;
Bethard, Steven J. ;
McClosky, David .
PROCEEDINGS OF 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: SYSTEM DEMONSTRATIONS, 2014, :55-60
[9]  
Touvron H, 2023, Arxiv, DOI [arXiv:2302.13971, DOI 10.48550/ARXIV.2302.13971]
[10]  
Wei Jason, 2019, EDA: Easy data augmentation techniques for boosting performance on text classification tasks