A Comparison between Two Feature Selection Algorithms

被引:0
作者
Bancioiu, Camil [1 ]
Vintan, Lucian [1 ]
机构
[1] Lucian Blaga Univ Sibiu, Fac Engn, Sibiu, Romania
来源
2017 21ST INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC) | 2017年
关键词
Feature Selection Methods; Information Gain; Markov Blankets; Documents Classification; Naive Bayes Classifiers;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article provides a comparison of two feature selection algorithms, Information Gain Thresholding and Koller and Sahami's algorithm in the context of text document classification on the Reuters Corpus Volume 1 dataset. The algorithms were evaluated by testing the performance of classifiers trained on the features they select from a given dataset. Results show that Koller and Sahami's algorithm consistently outperforms Information Gain Thresholding by capturing interactions between features and avoiding redundancy among features, although it achieves its gains through increased complexity and longer running time.
引用
收藏
页码:242 / 247
页数:6
相关论文
共 10 条
[1]  
Cover TM., 1999, ELEMENTS INFORM THEO, DOI DOI 10.1002/047174882X
[2]  
Guyon I, 2006, STUD FUZZ SOFT COMP, V207, P1
[3]  
Koller Daphne., 1995, In 13th International Conference on Machine Learning ICML, P284
[4]   Information gain and divergence-based feature selection for machine learning-based text categorization [J].
Lee, CK ;
Lee, GG .
INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (01) :155-165
[5]  
Lewis D. D., 2015, RCV1 V2 LYRL2004 LYR
[6]  
Lewis DD, 2004, J MACH LEARN RES, V5, P361
[7]  
Morariu I. D., 2007, THESIS
[8]  
Pedregosa F, 2011, J MACH LEARN RES, V12, P2825
[9]  
Russell SJ., 2009, ARTIF INTELL
[10]  
Yang Y., 1997, ICML, V97, P412