Feature Subset Selection by Bayesian network-based optimization

被引:164
作者
Inza, I [1 ]
Larrañaga, P [1 ]
Etxeberria, R [1 ]
Sierra, B [1 ]
机构
[1] Univ Basque Country, Dept Comp Sci & Artificial Intelligence, E-20080 San Sebastian, Basque Country, Spain
关键词
machine learning; supervised learning; Feature Subset Selection; wrapper; predictive accuracy; Estimation of Distribution Algorithm; Estimation of Bayesian Network Algorithm; Bayesian network; overfitting;
D O I
10.1016/S0004-3702(00)00052-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A new method for Feature Subset Selection in machine learning, FSS-EBNA (Feature Subset Selection by Estimation of Bayesian Network Algorithm), is presented. FSS-EBNA is an evolutionary, population-based, randomized search algorithm, and it can be executed when domain knowledge is not available. A wrapper approach, over Naive-Bayes and ID3 learning algorithms, is used to evaluate the goodness of each visited solution. FSS-EBNA, based on the EDA (Estimation of Distribution Algorithm) paradigm, avoids the use of crossover and mutation operators to evolve the populations, in contrast to Genetic Algorithms. In absence of these operators, the evolution is guaranteed by the factorization of the probability distribution of the best solutions found in a generation of the search. This factorization is carried out by means of Bayesian networks. Promising results are achieved in a variety of tasks where domain knowledge is not available. The paper explains the main ideas of Feature Subset Selection, Estimation of Distribution Algorithm and Bayesian networks, presenting related work about each concept. A study about the 'overfitting' problem in the Feature Subset Selection process is carried out, obtaining a basis to define the stopping criteria of the new algorithm. (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:157 / 184
页数:28
相关论文
共 88 条
[1]   INSTANCE-BASED LEARNING ALGORITHMS [J].
AHA, DW ;
KIBLER, D ;
ALBERT, MK .
MACHINE LEARNING, 1991, 6 (01) :37-66
[2]  
AHA DW, 1999, COMMUNICATION
[3]  
AHA DW, 1994, P AAAI 94 WORKSH CAS, P106
[4]  
ALMUALLIM H, 1991, PROCEEDINGS : NINTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2, P547
[5]   Combined 5 x 2 cv F test for comparing supervised classification learning algorithms [J].
Alpaydin, E .
NEURAL COMPUTATION, 1999, 11 (08) :1885-1892
[6]  
[Anonymous], 1994, P 10THCONFERENCE UNC
[7]  
[Anonymous], [No title captured]
[8]  
[Anonymous], 2021, ACM T INTERACT INTEL, DOI DOI 10.1145/3387166
[9]  
[Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
[10]  
[Anonymous], OPTIMAL SUBSET SELEC