KNN based machine learning approach for text and document mining

被引:0
作者
Bijalwan, Vishwanath [1 ]
Kumar, Vinay [2 ]
Kumari, Pinki [3 ]
Pascual, Jordan [4 ]
机构
[1] Institute of technology Gopeshwar, Chamoli, Uttarakhand
[2] GLA University, Mathura
[3] Bansathali University, Rajasthan
[4] Department of Computer Science, University of Oviedo
来源
International Journal of Database Theory and Application | 2014年 / 7卷 / 01期
关键词
Document mining; Event models; KNN; Machine learning; Naïve bayes; Term-graph; Text mining;
D O I
10.14257/ijdta.2014.7.1.06
中图分类号
学科分类号
摘要
Text Categorization (TC), also known as Text Classification, is the task of automatically classifying a set of text documents into different categories from a predefined set. If a document belongs to exactly one of the categories, it is a single-label classification task; otherwise, it is a multi-label classification task. TC uses several tools from Information Retrieval (IR) and Machine Learning (ML) and has received much attention in the last years from both researchers in the academia and industry developers. In this paper, we first categorize the documents using KNN based machine learning approach and then return the most relevant documents. © 2014 SERSC.
引用
收藏
页码:61 / 70
页数:9
相关论文
共 20 条
[1]  
Vidhya K.A., Aghila G., A Survey of Naïve Bayes Machine Learning approach in Text Document Classification, International Journal of Computer Science and Information Security, 7, 2, pp. 206-211, (2010)
[2]  
Abdullah Z., Hitam M.S., Features Extraction Algorithm from sgml for Classification, Journal of Theoretical and Applied Information Technology, 3, pp. 72-78, (2007)
[3]  
Wang L., Zhao X., Improved knn Classification Algortihm Research in Text Categorization, In the Proceedings of the 2nd International Conference on Communications and Networks (CECNet), pp. 1848-1852, (2012)
[4]  
McCallum A., Nigam K., A Comparison of Event Models For Naïve Bayes Text Classification, The Proceedings Of The Workshop On Learning For Text Categorization, pp. 41-48, (1998)
[5]  
Wang W., Do D.B., Lin X., Term Graph Model for Text Classification, The Proceedings of the International Conference on Advanced Data Mining and Applications, pp. 19-30, (2005)
[6]  
Mitchell T.M., Machine Learning, (1997)
[7]  
Chang Y.H., Huang H.Y., An automatic document classification Based on Naïve Bayes Classifier and Ontology, In the Proceedings of the seventh International conference on Machine Learning and Cybernetics, pp. 3144-3149, (2008)
[8]  
Gupta V., Lehal G.S., A Survey of Text Mining Techniques and Applications, Journal of Emerging Technologies in Web Intelligence, 1, 1, (2009)
[9]  
Semwal V.B., Kumar K.S., Bhaskar V.S., Sati M., Accurate location estimation of moving object with energy constraint & adaptive update algorithms to save data, (2011)
[10]  
Kumar K.S., Semwal V.B., Tripathi R.C., Real time face recognition using adaboost improved fast PCA algorithm, (2011)