KNN Text Categorization Algorithm Based on Semantic Centre

被引:3
作者
Zhang Xiao-fei [1 ]
Huang He-yan [1 ]
Zhang Ke-liang [2 ]
机构
[1] Chinese Acad Sci, Res Ctr C&L Informat Engn, Beijing, Peoples R China
[2] Luoyang Univ Foreign Languages, Ctr Computat Linguist, Luoyang, Peoples R China
来源
2009 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND COMPUTER SCIENCE, VOL 1, PROCEEDINGS | 2009年
基金
美国国家科学基金会;
关键词
KNN; text categorization; semantic center;
D O I
10.1109/ITCS.2009.57
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a classical statistical pattern recognition algorithm characterized with high accuracy and stability, KNN has been used widely in text categorization. But since KNN's time complexity is directly proportional to the sample size, its classification speed is very slow. In this paper, we propose a new KNN text categorization algorithm based on semantic centre, which we call SKNN, to speed up the classification. The basic thread is to replace the large number of original sample documents with a small amount of sample semantic centers. Experiments have proved that the SKNN's clarification is over 10 times as fast as that of the traditional KNN and its F1 value is approximately equal to SVM and traditional KNN algorithm.
引用
收藏
页码:249 / +
页数:2
相关论文
共 11 条
[1]  
COHEN WW, 1996, P 19 ANN INT ACM SIG, P307
[2]  
[代六玲 Dai Liuling], 2004, [中文信息学报, Journal of Chinese Information Processing], V18, P26
[3]  
DEHOON M, 2006, C CLUSTERING LIB CDN
[4]  
GAN C, 2007, J SE U NATURAL SCI A, V37
[5]  
Joachims T., 1998, MACHINE LEARNING ECM, P137, DOI DOI 10.1007/BFB0026683
[6]  
Lewis D.D., 1998, LECT NOTES COMPUTER, V1398, P4
[7]  
LI RL, THESIS FUDAN U SHANG
[8]  
McCallum A, 1999, IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2, P662
[9]   Machine learning in automated text categorization [J].
Sebastiani, F .
ACM COMPUTING SURVEYS, 2002, 34 (01) :1-47
[10]  
Yang YM, 1999, SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P42, DOI 10.1145/312624.312647