KNN with TF-IDF Based Framework for Text Categorization

被引:159
作者
Trstenjak, Bruno [1 ]
Mikac, Sasa [2 ]
Donko, Dzenana [3 ]
机构
[1] Medimurje Univ Appl Sci Cakovec, Dept Comp Engn, Cakovec, Croatia
[2] Fac Elect Engn & Comp Sci, Dept Comp Sci, Maribor, Slovenia
[3] Fac Elect Engn, Dept Comp Sci, Sarajevo, Bosnia & Herceg
来源
24TH DAAAM INTERNATIONAL SYMPOSIUM ON INTELLIGENT MANUFACTURING AND AUTOMATION, 2013 | 2014年 / 69卷
关键词
text documents classification; K-Nearest Neighbor; TF-IDF; framework; machine learning;
D O I
10.1016/j.proeng.2014.03.129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
KNN is a very popular algorithm for text classification. This paper presents the possibility of using KNN algorithm with TF-IDF method and framework for text classification. Framework enables classification according to various parameters, measurement and analysis of results. Evaluation of framework was focused on the speed and quality of classification. The results of testing showed the good and bad features of algorithm, providing guidance for the further development of similar frameworks. (C) 2014 The Authors. Published by Elsevier Ltd.
引用
收藏
页码:1356 / 1364
页数:9
相关论文
共 18 条
[11]   SPECIFICATION OF TERM VALUES IN AUTOMATIC INDEXING [J].
SALTON, G ;
YANG, CS .
JOURNAL OF DOCUMENTATION, 1973, 29 (04) :351-372
[12]  
Sebastiani Fabrizio., 2002, MACHINE LEARNING AUT
[13]   Using clustering to improve the KNN-based classifiers for online anomaly network traffic identification [J].
Su, Ming-Yang .
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2011, 34 (02) :722-730
[14]   Neighbor-weighted K-nearest neighbor for unbalanced text corpus [J].
Tan, SB .
EXPERT SYSTEMS WITH APPLICATIONS, 2005, 28 (04) :667-671
[15]   A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm [J].
Uguz, Harun .
KNOWLEDGE-BASED SYSTEMS, 2011, 24 (07) :1024-1032
[16]  
Wang I., 2010, IMPROVED KNN ALGORIT
[17]  
Wang L., 2012, IMPROVED KNN CLASSIF
[18]   A comparative study of TF*IDF, LSI and multi-words for text classification [J].
Zhang, Wen ;
Yoshida, Taketoshi ;
Tang, Xijin .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) :2758-2765