Flexible Variable Selection for Recovering Sparsity in Nonadditive Nonparametric Models

被引:10
作者
Fang, Zaili [1 ]
Kim, Inyoung [1 ]
Schaumont, Patrick [2 ]
机构
[1] Virginia Tech, Dept Stat, Blacksburg, VA 24061 USA
[2] Virginia Tech, Dept Elect & Comp Engn, Blacksburg, VA USA
基金
美国国家科学基金会;
关键词
Kernel learning; LASSO; Multivariate smoothing function; Nonnegative garrote; Sparsistency; Variable selection; KERNEL MACHINES; REGRESSION; CONSISTENCY; GARROTE; LASSO;
D O I
10.1111/biom.12518
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Variable selection for recovering sparsity in nonadditive and nonparametric models with high-dimensional variables has been challenging. This problem becomes even more difficult due to complications in modeling unknown interaction terms among high-dimensional variables. There is currently no variable selection method to overcome these limitations. Hence, in this article we propose a variable selection approach that is developed by connecting a kernel machine with the nonparametric regression model. The advantages of our approach are that it can: (i) recover the sparsity; (ii) automatically model unknown and complicated interactions; (iii) connect with several existing approaches including linear nonnegative garrote and multiple kernel learning; and (iv) provide flexibility for both additive and nonadditive nonparametric models. Our approach can be viewed as a nonlinear version of a nonnegative garrote method. We model the smoothing function by a Least Squares Kernel Machine (LSKM) and construct the nonnegative garrote objective function as the function of the sparse scale parameters of kernel machine to recover sparsity of input variables whose relevances to the response are measured by the scale parameters. We also provide the asymptotic properties of our approach. We show that sparsistency is satisfied with consistent initial kernel function coefficients under certain conditions. An efficient coordinate descent/backfitting algorithm is developed. A resampling procedure for our variable selection methodology is also proposed to improve the power.
引用
收藏
页码:1155 / 1163
页数:9
相关论文
共 16 条
[1]  
Bach FR, 2008, J MACH LEARN RES, V9, P1179
[2]   BETTER SUBSET REGRESSION USING THE NONNEGATIVE GARROTE [J].
BREIMAN, L .
TECHNOMETRICS, 1995, 37 (04) :373-384
[3]   SOME RESULTS ON TCHEBYCHEFFIAN SPLINE FUNCTIONS [J].
KIMELDORF, G ;
WAHBA, G .
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1971, 33 (01) :82-+
[4]  
Korsgaard IR, 1998, GENET SEL EVOL, V30, P241, DOI 10.1051/gse:19980303
[5]  
Lanckriet GRG, 2004, J MACH LEARN RES, V5, P27
[6]   Component selection and smoothing in multivariate nonparametric regression [J].
Lin, Yi ;
Zhang, Hao Helen .
ANNALS OF STATISTICS, 2006, 34 (05) :2272-2297
[7]   Semiparametric regression of multidimensional genetic pathway data: Least-squares kernel machines and linear mixed models [J].
Liu, Dawei ;
Lin, Xihong ;
Ghosh, Debashis .
BIOMETRICS, 2007, 63 (04) :1079-1088
[8]   Powerful Tests for Detecting a Gene Effect in the Presence of Possible Gene-Gene Interactions Using Garrote Kernel Machines [J].
Maity, Arnab ;
Lin, Xihong .
BIOMETRICS, 2011, 67 (04) :1271-1284
[9]  
Micchelli CA, 2005, J MACH LEARN RES, V6, P1099
[10]   PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes [J].
Mootha, VK ;
Lindgren, CM ;
Eriksson, KF ;
Subramanian, A ;
Sihag, S ;
Lehar, J ;
Puigserver, P ;
Carlsson, E ;
Ridderstråle, M ;
Laurila, E ;
Houstis, N ;
Daly, MJ ;
Patterson, N ;
Mesirov, JP ;
Golub, TR ;
Tamayo, P ;
Spiegelman, B ;
Lander, ES ;
Hirschhorn, JN ;
Altshuler, D ;
Groop, LC .
NATURE GENETICS, 2003, 34 (03) :267-273