KNN with TF-IDF Based Framework for Text Categorization

被引:159
作者
Trstenjak, Bruno [1 ]
Mikac, Sasa [2 ]
Donko, Dzenana [3 ]
机构
[1] Medimurje Univ Appl Sci Cakovec, Dept Comp Engn, Cakovec, Croatia
[2] Fac Elect Engn & Comp Sci, Dept Comp Sci, Maribor, Slovenia
[3] Fac Elect Engn, Dept Comp Sci, Sarajevo, Bosnia & Herceg
来源
24TH DAAAM INTERNATIONAL SYMPOSIUM ON INTELLIGENT MANUFACTURING AND AUTOMATION, 2013 | 2014年 / 69卷
关键词
text documents classification; K-Nearest Neighbor; TF-IDF; framework; machine learning;
D O I
10.1016/j.proeng.2014.03.129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
KNN is a very popular algorithm for text classification. This paper presents the possibility of using KNN algorithm with TF-IDF method and framework for text classification. Framework enables classification according to various parameters, measurement and analysis of results. Evaluation of framework was focused on the speed and quality of classification. The results of testing showed the good and bad features of algorithm, providing guidance for the further development of similar frameworks. (C) 2014 The Authors. Published by Elsevier Ltd.
引用
收藏
页码:1356 / 1364
页数:9
相关论文
共 18 条
[1]  
Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670
[2]   Natural language processing: State of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine [J].
Friedman, Carol ;
Rindflesch, Thomas C. ;
Corn, Milton .
JOURNAL OF BIOMEDICAL INFORMATICS, 2013, 46 (05) :765-773
[3]  
Guo GD, 2003, LECT NOTES COMPUT SC, V2888, P986
[4]  
Han Eui-Hong Sam, 2001, LNCS, P53, DOI [10.1007/3-540-45357-1_9, DOI 10.1007/3-540-45357-1]
[5]  
Jiang H., 2009, INTELLIGENT COMPUTIN
[6]  
Kwok JTY, 1998, ICONIP'98: THE FIFTH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING JOINTLY WITH JNNS'98: THE 1998 ANNUAL CONFERENCE OF THE JAPANESE NEURAL NETWORK SOCIETY - PROCEEDINGS, VOLS 1-3, P347
[7]   Supervised and Traditional Term Weighting Methods for Automatic Text Categorization [J].
Lan, Man ;
Tan, Chew Lim ;
Su, Jian ;
Lu, Yue .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (04) :721-735
[8]   Semantic Search based on the Online Integration of NLP Techniques [J].
Masuda, Katsuya ;
Matsuzaki, Takuya ;
Tsujii, Jun'ichi .
COMPUTATIONAL LINGUISTICS AND RELATED FIELDS, 2011, 27 :281-290
[9]  
Miah M., 2009, IMPROVED K NN ALGORI, P434
[10]  
Mikawa K., 2011, PROPOSAL EXTENDED CO