An improved kNN text classification method

被引:20
作者
Wang, Fengfei [1 ]
Liu, Zhen [1 ,2 ]
Wang, Chundong [1 ]
机构
[1] Tianjin Univ Technol, Grad Sch Comp & Commun Engn, Tianjin, Peoples R China
[2] Nagasaki Inst Appl Sci, Grad Sch Engn, 536 Aba Machi, Nagasaki 8510193, Japan
关键词
text classification; k-nearest neighbours; kNN; self-organising map; SOM; neural network; computer science; engineering;
D O I
10.1504/IJCSE.2019.103944
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper proposes an improved kNN text classification method. The kNN algorithm in vector space models (VSM) has several limitations, such as occupying excessive storage space and all dimensions in the kNN algorithm share the same weight, making classification inaccurate. To solve these problems, this paper proposes a SOM neural network with principal component weighting. In this model, the principal component analysis process is embedded into the SOM neural network. Specifically, principal component analysis is used to extract the main feature components of the assessed target. Then, it is inputted into the network for computation. Meanwhile, variance contribution rates of principal components are introduced into the Euclidean distance function in the forms of weights. Using the principal component weighting SOM algorithm to compute the weights of VSM dimensions together with the kNN algorithm could effectively reduce dimensions of a vector space, and increase the precision and speed of the kkNN text classification method.
引用
收藏
页码:397 / 403
页数:7
相关论文
共 11 条
[1]  
Dai CY, 2016, INT J COMPUT SCI ENG, V12, P146
[2]   Effective task scheduling for heterogeneous distributed systems using firefly algorithm [J].
Eswari, R. ;
Nickolas, S. .
INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2015, 11 (02) :132-142
[3]  
Geng Li-Juan, 2014, Application Research of Computers, V31, P1342, DOI 10.3969/j.issn.1001-3695.2014.05.013
[4]   A KNN-Scoring Based Core-Growing Approach to Cluster Analysis [J].
Hsieh, T. W. ;
Taur, J. S. ;
Kung, S. Y. .
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2010, 60 (01) :105-114
[5]   Human performance modeling for manufacturing based on an improved KNN algorithm [J].
Li, Ni ;
Kong, Haipeng ;
Ma, Yaofei ;
Gong, Guanghong ;
Huai, Wenqing .
INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2016, 84 (1-4) :473-483
[6]  
Prasad G.V.S.N.R.V., 2011, INT J COMPUTATIONAL, V3
[7]  
Rama B., 2010, International Journal on Computer Science and Engineering, V1, P2976
[8]   Implementation of clustering based unit commitment employing imperialistic competition algorithm [J].
Reddy, G. Venkata Subba ;
Ganesh, V. ;
Rao, C. Srinivasa .
INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2016, 82 :621-628
[9]  
Sreemathy J., 2012, International Journal on Computer Science and Engineering (IJCSE), V4, P392
[10]   On the suitability of Prototype Selection methods for kNN classification with distributed data [J].
Valero-Mas, Jose J. ;
Calvo-Zaragoza, Jorge ;
Rico-Juan, Juan R. .
NEUROCOMPUTING, 2016, 203 :150-160