Selection of Relevant Features for Text Classification with K-NN

被引:0
作者
Balicki, Jerzy [1 ]
Krawczyk, Henryk [1 ]
Rymko, Lukasz [1 ]
Szymanski, Julian [1 ]
机构
[1] Gdansk Univ Technol, Dept Comp Syst Architecture, Gdansk, Poland
来源
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II | 2013年 / 7895卷
关键词
text representation; documents categorization; information retrieval; feature selection;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we describe five features selection techniques used for a text classification. An information gain, independent significance feature test, chi-squared test, odds ratio test, and frequency filtering have been compared according to the text benchmarks based on Wikipedia. For each method we present the results of classification quality obtained on the test datasets using K-NN based approach. A main advantage of evaluated approach is reducing the dimensionality of the vector space that allows to improve effectiveness of classification task. The information gain method, that obtained the best results, has been used for evaluation of features selection and classification scalability. We also provide the results indicating the feature selection is also useful for obtaining the common-sense features for describing natural-made categories.
引用
收藏
页码:477 / 488
页数:12
相关论文
共 15 条
[1]  
[Anonymous], LNCS
[2]   Extended Hopfield Model of Neural Networks for Combinatorial Multiobjective Optimization Problems [J].
Balicki, J ;
Kitowski, Z ;
Stateczny, A .
IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, :1646-1651
[3]  
Balicki J, 2009, REC ADV COMPUT ENG, P417
[4]  
Biesiada J, 2007, ADV INTEL SOFT COMPU, V45, P242
[5]  
Blachnik M., 2007, COMPUTER RECOGNITION, V3
[6]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[7]  
Forman G., 2002, J MACHINE LEARNING R
[8]   Digital Intuition: Applying Common Sense Using Dimensionality Reduction [J].
Havasi, Catherine ;
Pustejovsky, James ;
Speer, Robert ;
Lieberman, Henry .
IEEE INTELLIGENT SYSTEMS, 2009, 24 (04) :24-35
[9]  
KENT JT, 1983, BIOMETRIKA, V70, P163, DOI 10.1093/biomet/70.1.163
[10]   Wrappers for feature subset selection [J].
Kohavi, R ;
John, GH .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :273-324