Efficient feature selection using shrinkage estimators

被引:24
作者
Sechidis, Konstantinos [1 ]
Azzimonti, Laura [2 ]
Pocock, Adam [3 ]
Corani, Giorgio [2 ]
Weatherall, James [4 ]
Brown, Gavin [1 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
[2] Ist Dalle Molle Studi Sull Intelligenza Artificia, Manno, Switzerland
[3] Oracle Labs, Burlington, MA USA
[4] AstraZeneca, Global Med Dev, Adv Analyt Ctr, Cambridge, England
基金
英国工程与自然科学研究理事会;
关键词
Feature selection; High order feature selection; Mutual information; Shrinkage estimators; MUTUAL INFORMATION; ENTROPY; DEPENDENCIES; ALGORITHMS; INFERENCE;
D O I
10.1007/s10994-019-05795-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information theoretic feature selection methods quantify the importance of each feature by estimating mutual information terms to capture: the relevancy, the redundancy and the complementarity. These terms are commonly estimated by maximum likelihood, while an under-explored area of research is how to use shrinkage methods instead. Our work suggests a novel shrinkage method for data-efficient estimation of information theoretic terms. The small sample behaviour makes it particularly suitable for estimation of discrete distributions with large number of categories (bins). Using our novel estimators we derive a framework for generating feature selection criteria that capture any high-order feature interaction for redundancy and complementarity. We perform a thorough empirical study across datasets from diverse sources and using various evaluation measures. Our first finding is that our shrinkage based methods achieve better results, while they keep the same computational cost as the simple maximum likelihood based methods. Furthermore, under our framework we derive efficient novel high-order criteria that outperform state-of-the-art methods in various tasks.
引用
收藏
页码:1261 / 1286
页数:26
相关论文
共 41 条
[1]  
Agresti A., 2013, CATEGORICAL DATA ANA, V3
[2]  
Aliferis CF, 2010, J MACH LEARN RES, V11, P171
[3]  
[Anonymous], 2005, Statistical Methods and Applications
[4]  
[Anonymous], 2005, MACHINE LEARNING BAS
[5]  
[Anonymous], EUR C COMP VIS ECCV
[6]  
[Anonymous], WORKS APPL EVOLUTION
[7]   Bayesian and Quasi-Bayesian Estimators for Mutual Information from Discrete Data [J].
Archer, Evan ;
Park, Il Memming ;
Pillow, Jonathan W. .
ENTROPY, 2013, 15 (05) :1738-1755
[8]   Feature Selection with Annealing for Computer Vision and Big Data Learning [J].
Barbu, Adrian ;
She, Yiyuan ;
Ding, Liangjing ;
Gramajo, Gary .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (02) :272-286
[9]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[10]   A review of microarray datasets and applied feature selection methods [J].
Bolon-Canedo, V. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. ;
Benitez, J. M. ;
Herrera, F. .
INFORMATION SCIENCES, 2014, 282 :111-135