Leveraging Sequence-to-Sequence Models for Kannada Abstractive Summarization

被引:0
作者
Dakshayani Ijeri [1 ]
Pushpa B. Patil [2 ]
机构
[1] BLDEA’s V. P. Dr. P. G. Halakatti College of Engineering and Technology,Department of Computer Science and Engineering
[2] (Affiliated to Visvesvaraya Technological University,Department of Computer Science and Engineering (Data Science)
[3] Belagavi-590018),undefined
[4] BLDEA’s V. P. Dr. P. G. Halakatti College of Engineering and Technology,undefined
[5] (Affiliated to Visvesvaraya Technological University,undefined
[6] Belagavi-590018),undefined
关键词
Abstractive text summarization; Sequence to sequence model; NLP;
D O I
10.1007/s42979-025-04045-7
中图分类号
学科分类号
摘要
The current scenario in the digital world is overwhelming with the digital data every day in various sectors such as business, healthcare, education, entertainment and many others. The data is available even in Indian regional languages such as Kannada, a Dravidian language spoken by over 50 million people. Text summarization plays a vital role in facilitating efficient information retrieval which saves time and strengthens accessibility by refining complex content into concise, meaningful insights. Text summarization had advanced significantly in global languages, whereas other Indian regional language like Kannada is in limited progress. Minimal development is carried out in this language using extractive method which fails to generate coherent, human-like summaries. The proposed work focuses on Kannada language which is one of the significant languages in India as 20% of Indian population communicate in Kannada as their birth language and it holds 27th rank among top 30 languages across the world. Text summarization can be implemented in two approaches such as extractive method and abstractive method. Extractive summarization method gives the summary by taking out the main sentences from the document and abstractive summarization method gives brief and concise information with respect to the context of the paragraph using new words, phrases, or sentences that may not appear in source text. Proposed work is based on abstractive approach using sequence to sequence model based on Long Short-Term Memory (LSTM). This work presents a novel approach for text summarization in Kannada, making it the first known study to address this problem in this language using abstractive method. The model achieved the accuracy of F1-score of 0.7046 for Rouge-1, 0.5499 for Rouge-2 and 0.7046 for Rouge-L on the dataset of 10,000 documents.
引用
收藏
相关论文
empty
未找到相关数据