An Extractive Text Summarization Technique for Bengali Document(s) using K-means Clustering Algorithm

被引:0
作者
Akter, Sumya [1 ]
Asa, Aysa Siddika [1 ]
Uddin, Md. Palash [1 ]
Hossain, Md. Delowar [1 ]
Roy, Shikhor Kumer [1 ]
Ibn Afjal, Masud [1 ]
机构
[1] Hajee Mohammad Danesh Sci & Technol Univ HSTU, Fac Comp Sci & Engn, Dinajpur 5200, Bangladesh
来源
2017 IEEE INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR) | 2017年
关键词
data mining; text summarization; extractive summarization; bengali document(s) summarization; TF*IDF; K-means clustering algorithm;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text summarization, a field of data mining, is very important for developing various real-life applications. Many techniques have been developed for summarizing English text(s). But, a few attempts have been made for Bengali text because of its some multifaceted structure. This paper presents a method for text summarization which extracts important sentences from a single or multiple Bengali documents. The input document(s) should be pre-processed by tokenization, stemming operation etc. Then, word score is calculated by Term-Frequency/Inverse Document Frequency (TF/IDF) and sentence score is determined by summing up its constituent words' scores with its position. Cue and skeleton words have also been considered to calculate the sentence score. For single or multiple documents, K-means clustering algorithm has been applied to produce the final summary. The experimental result shows satisfactory outputs in comparison to the existing approaches possessing linear run time complexity.
引用
收藏
页数:6
相关论文
共 21 条
  • [1] Agrawal A., 2014, IEEE INT J SCI RES P, V4
  • [2] [Anonymous], TECHNIA INT J COMPUT
  • [3] Chezian R. M., 2015, INT J TRENDS ENG TEC, V3
  • [4] Deshpande Anjali R., 2013, INT J ENG TRENDS TEC, V4
  • [5] Efat MIA, 2013, 2013 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV)
  • [6] El-Ghannam F., 2013, INT J COMPUTER SCI I, V5
  • [7] Ferreira R., 2013, ELSIVIER INT J EXPER
  • [8] Gupta Vishal., 2010, Journal of Emerging Technologies in Web Intelligence, V2
  • [9] Hu P., 2010, IEEE INT C COMP INT
  • [10] Kamal R., BANGLA STEMMER