Using kNN model for automatic text categorization

被引:0
作者
Gongde Guo
Hui Wang
David Bell
Yaxin Bi
Kieran Greer
机构
[1] University of Ulster,School of Computing and Mathematics
[2] Queen's University Belfast,School of Computer Science
来源
Soft Computing | 2006年 / 10卷
关键词
NN Model; NN; Rocchio; Text categorization; Performance;
D O I
暂无
中图分类号
学科分类号
摘要
An investigation is conducted on two well-known similarity-based learning approaches to text categorization: the k-nearest neighbors (kNN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, a new classifier called the kNN model-based classifier (kNN Model) is proposed. It combines the strength of both kNN and Rocchio. A text categorization prototype, which implements kNN Model along with kNN and Rocchio, is described. An experimental evaluation of different methods is carried out on two common document corpora: the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the proposed kNN model-based method outperforms the kNN and Rocchio classifiers, and is therefore a good alternative for kNN and Rocchio in some application areas.
引用
收藏
页码:423 / 430
页数:7
相关论文
共 4 条
[1]  
Cohen W(1999)Context-sensitive learning methods for text categorization ACM Trans Inform Syst 17 141-173
[2]  
Singer Y(2002)Machine learning in automated text categorization ACM Comput Surv 34 1-47
[3]  
Sebastiani F(1998)Approximate statistical tests for comparing supervised classification learning algorithms Neural Comput 10 1895-1924
[4]  
Dietterich T(undefined)undefined undefined undefined undefined-undefined