Automatic Text Categorization of Marathi Documents Using Clustering Technique

被引:0
作者
Vispute, Sushma R. [1 ]
Potey, M. A. [1 ]
机构
[1] DYPCOE, Pune, Maharashtra, India
来源
2013 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING TECHNOLOGIES (ICACT) | 2013年
关键词
Text categorization; Clustering; Information filtering; Internet search; Information retrieval;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The purpose of the present work is creating an intelligent system to retrieve desired documents in Marathi language. The system also focuses on providing the personalized documents in Marathi language to the end user based on their interests identified from the browsing history. This paper presents the automatic categorization of Marathi documents and the literature survey of the related work done in automatic categorization of text documents. Several supervised learning techniques are exists for the classification of text documents namely Decision trees, Support Vector machine (SVM), Neural Network, Ada Boost and Naive Bayes etc. Several clustering techniques are also available for text categorization namely K-means, Suffix Tree Clustering (STC), Semantic Online Hierarchical Clustering (SHOC), Label Induction Grouping Algorithm (LINGO) etc. In the literature survey it is found that vector space model (VSM) gives better result than probabilistic model. This paper presents categorization of the Marathi text documents using Lingo Clustering algorithm based on VSM. The data set consists of 107 Marathi documents of 3 different categories- Tourism, Health Programmes and Maharashtra festivals. The result shows that the performance of the LINGO clustering algorithm is good for categorizing the Marathi text documents. For the Marathi documents overall accuracy of the system is 91.10%.
引用
收藏
页数:5
相关论文
共 16 条
[1]  
Alsaleem S., 2011, INT ARAB J E TECHNOL, V2
[2]  
[Anonymous], 1999, THESIS U WASHINGTON
[3]   A Survey of Web Clustering Engines [J].
Carpineto, Claudio ;
Osinski, Stanislaw ;
Romano, Giovanni ;
Weiss, Dawid .
ACM COMPUTING SURVEYS, 2009, 41 (03)
[4]  
Chinglai Hor, 2007, APPL GENETIC ALGORIT
[5]  
Dumais S., INDUCTIVE LEARNING A
[6]  
El Kourdi Mohamed., 2004, Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, P51
[7]  
Freitas A A., SURVEY EVOLUTIONARY
[8]  
Iniya Nehru E., 2009, AUTOMATIC E CONTENT
[9]  
Kohilavani S., 2009, IAMA 2009 2010 IEEE
[10]  
OSINSKI S, 2003, THESIS POZNAN U TECH