A sparse linear regression model for incomplete datasets

被引:6
作者
Veras, Marcelo B. A. [1 ]
Mesquita, Diego P. P. [2 ]
Mattos, Cesar L. C. [1 ]
Gomes, Joao P. P. [1 ]
机构
[1] Univ Fed Ceara, Dept Comp Sci, Rua Campus Pici Sn, BR-60455900 Fortaleza, Ceara, Brazil
[2] Aalto Univ, Dept Comp Sci, Konemiehentie 2, Espoo 02150, Finland
关键词
Forward stagewise regression; Missing data; Gaussian mixtures; DISTANCE ESTIMATION; CLASSIFICATION; ALGORITHMS; MIXTURE;
D O I
10.1007/s10044-019-00859-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Incomplete data are often neglected when designing machine learning methods. A popular strategy adopted by practitioners to circumvent this consists of taking a preprocessing step to fill the missing components. These preprocessing algorithms are designed independently of the machine learning method that will be applied subsequently, which may lead to sub-optimal results. An alternative solution is to redesign classical machine learning methods to handle missing data directly. In this paper, we propose a variant of the forward stagewise regression (FSR) algorithm for incomplete data. The original FSR is an iterative procedure to estimate parameters of sparse linear models. The proposed method, named forward stagewise regression for incomplete datasets with GMM (FSIG), models the missing components as random variables following a Gaussian mixture distribution. In FSIG, the main steps of FSR are adapted to deac with the intrinsic uncertainty of incomplete samples. The performance of FSIG was evaluated in an extensive set of experiments, and our model was able to outperform classical methods in most of the tested cases.
引用
收藏
页码:1293 / 1303
页数:11
相关论文
共 34 条
[1]   Handling missing values in kernel methods with application to microbiology data [J].
Belanche, Lluis A. ;
Kobayashi, Vladimer ;
Aluja, Tomas .
NEUROCOMPUTING, 2014, 141 :110-116
[2]  
Chen SSB, 2001, SIAM REV, V43, P129, DOI [10.1137/S003614450037906X, 10.1137/S1064827596304010]
[3]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[4]  
Dua D., 2017, UCI machine learning repository
[5]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[6]   Mixture of Gaussians for distance estimation with missing data [J].
Eirola, Emil ;
Lendasse, Amaury ;
Vandewalle, Vincent ;
Biernacki, Christophe .
NEUROCOMPUTING, 2014, 131 :32-42
[7]   Distance estimation in numerical data sets with missing values [J].
Eirola, Emil ;
Doquire, Gauthier ;
Verleysen, Michel ;
Lendasse, Amaury .
INFORMATION SCIENCES, 2013, 240 :115-128
[8]   Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems [J].
Figueiredo, Mario A. T. ;
Nowak, Robert D. ;
Wright, Stephen J. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2007, 1 (04) :586-597
[9]   A comparison of alternative tests of significance for the problem of m rankings [J].
Friedman, M .
ANNALS OF MATHEMATICAL STATISTICS, 1940, 11 :86-92
[10]   Pattern classification with missing data: a review [J].
Garcia-Laencina, Pedro J. ;
Sancho-Gomez, Jose-Luis ;
Figueiras-Vidal, Anibal R. .
NEURAL COMPUTING & APPLICATIONS, 2010, 19 (02) :263-282