Feature selection with missing data using mutual information estimators

被引:59
作者
Doquire, Gauthier [1 ]
Verleysen, Michel [1 ]
机构
[1] Catholic Univ Louvain, Machine Learning Grp, ICTEAM, B-1348 Louvain, Belgium
关键词
Feature selection; Missing data; Mutual information; FUNCTIONAL DATA; VALUES; IMPUTATION; REGRESSION; VARIABLES;
D O I
10.1016/j.neucom.2012.02.031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is an important preprocessing task for many machine learning and pattern recognition applications, including regression and classification. Missing data are encountered in many real-world problems and have to be considered in practice. This paper addresses the problem of feature selection in prediction problems where some occurrences of features are missing. To this end, the well-known mutual information criterion is used. More precisely, it is shown how a recently introduced nearest neighbors based mutual information estimator can be extended to handle missing data. This estimator has the advantage over traditional ones that it does not directly estimate any probability density function. Consequently, the mutual information may be reliably estimated even when the dimension of the space increases. Results on artificial as well as real-world datasets indicate that the method is able to select important features without the need for any imputation algorithm, under the assumption of missing completely at random data. Moreover, experiments show that selecting the features before imputing the data generally increases the precision of the prediction models, in particular when the proportion of missing data is high. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:3 / 11
页数:9
相关论文
共 50 条
[41]   Feature selection using mutual information based uncertainty measures for tumor classification [J].
Sun, Lin ;
Xu, Jiucheng .
BIO-MEDICAL MATERIALS AND ENGINEERING, 2014, 24 (01) :763-770
[42]   Feature selection using the hybrid of ant colony optimization and mutual information for the forecaster [J].
Zhang, CK ;
Hu, H .
PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, :1728-1732
[43]   Multi-Feature Selection of Handwriting for Gender Identification Using Mutual Information [J].
Tan, Jun ;
Suen, Ching Y. ;
Bi, Ning ;
Nobile, Nicola .
PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2016, :578-583
[44]   OPTIMAL BAYESIAN FEATURE SELECTION WITH MISSING DATA [J].
Pour, Ali Foroughi ;
Dalton, Lori A. .
2016 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2016, :35-39
[45]   Efficient feature selection using shrinkage estimators [J].
Sechidis, Konstantinos ;
Azzimonti, Laura ;
Pocock, Adam ;
Corani, Giorgio ;
Weatherall, James ;
Brown, Gavin .
MACHINE LEARNING, 2019, 108 (8-9) :1261-1286
[46]   Efficient feature selection using shrinkage estimators [J].
Konstantinos Sechidis ;
Laura Azzimonti ;
Adam Pocock ;
Giorgio Corani ;
James Weatherall ;
Gavin Brown .
Machine Learning, 2019, 108 :1261-1286
[47]   Quadratic Mutual Information Feature Selection [J].
Sluga, Davor ;
Lotric, Uros .
ENTROPY, 2017, 19 (04)
[48]   Normalized Mutual Information Feature Selection [J].
Estevez, Pablo. A. ;
Tesmer, Michel ;
Perez, Claudio A. ;
Zurada, Jacek A. .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (02) :189-201
[49]   Efficient and Intelligent Feature Selection via Maximum Conditional Mutual Information for Microarray Data [J].
Zhang, Jiangnan ;
Li, Shaojing ;
Yang, Huaichuan ;
Jiang, Jingtao ;
Shi, Hongtao .
APPLIED SCIENCES-BASEL, 2024, 14 (13)
[50]   Feature selection from high dimensional data based on iterative qualitative mutual information [J].
Nagpal, Arpita ;
Singh, Vijendra .
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (06) :5845-5856