Automatic Text Summarization of Konkani Texts Using Latent Semantic Analysis

被引:0
作者
D'Silva, Jovi [1 ]
Sharma, Uzzal [1 ]
More, Chaitali [2 ]
机构
[1] Assam Don Bosco Univ, Gauhati 782402, Assam, India
[2] Fr Agnel Coll Arts & Commerce, Pilar 403203, Goa, India
来源
INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, ICICC 2022, VOL 1 | 2023年 / 473卷
关键词
Automatic text summarization; Latent semantic analysis; Konkani; Low-resource; Singular value decomposition; Extractive text summarization;
D O I
10.1007/978-981-19-2821-5_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text summarization involves extracting relevant details from the contents of input text documents for generating summaries. This area of Natural Language Processing is widely researched, especially with popular languages like English. There is a need to extend this work to less commonly spoken languages of the world. This paper presents a language-independent text summarization approach using Latent Semantic Analysis in Konkani language. Konkani is a low-resource language with limited language processing tools, stop-word list, etc. Latent Semantic Analysis (LSA) is an unsupervised algebraic method that finds latent semantic structures to be used for performing extractive text summarization. We examined well-known Latent Semantic Analysis-based sentence selection approaches on our dataset, constructed using books on Konkani folk tales written in Devanagari script. The results of the experiments indicated that LSA-based approaches can produce promising summaries, with the Cross method performing the best in most metrics.
引用
收藏
页码:425 / 437
页数:13
相关论文
共 25 条
[1]  
Andhale N, 2016, 2016 INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA)
[2]  
[Anonymous], 2011, STAT 2 2011 DISTR PO
[3]   ELSA: A Multilingual Document Summarization Algorithm Based on Frequent Itemsets and Latent Semantic Analysis [J].
Cagliero, Luca ;
Garza, Paolo ;
Baralis, Elena .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2019, 37 (02)
[4]   Hybrid Latent Semantic Analysis and Random Indexing Model for Text Summarization [J].
Chatterjee, Niladri ;
Yadav, Nidhika .
INFORMATION AND COMMUNICATION TECHNOLOGY FOR COMPETITIVE STRATEGIES, 2019, 40 :149-156
[5]  
D'Silva J, 2019, J THEOR APPL INF TEC, V97
[6]  
Department of Computer Science & Applications Dr. Harisingh Gour Central University Sagar MP India., 2020, International Journal of Recent Technology and Engineering (IJRTE), V9, P150, DOI 10.35940/ijrte.b3288.079220
[7]  
El-KassasWS SalamaCR, 2021, EXPERTSYSTAPPL165
[8]   Automatic Arabic Text Summarization Using Analogical Proportions [J].
Elayeb, Bilel ;
Chouigui, Amina ;
Bounhas, Myriam ;
Ben Khiroun, Oussama .
COGNITIVE COMPUTATION, 2020, 12 (05) :1043-1069
[9]  
Geetha JK, 2015, 2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), P1508, DOI 10.1109/ICACCI.2015.7275826
[10]  
Gupta Hritvik, 2021, Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), P511, DOI 10.1109/ICAIS50930.2021.9395976