SPOT: Sparse Optimal Transformations for High Dimensional Variable Selection and Exploratory Regression Analysis

被引:0
作者
Huang, Qiming [1 ]
Zhu, Michael [1 ,2 ]
机构
[1] Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
[2] Tsinghua Univ, Dept Ind Engn, Ctr Stat Sci, Beijing, Peoples R China
来源
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2017年
关键词
Monotone transformation; optimal transformation; regression analysis; spline; variable selection; NONCONCAVE PENALIZED LIKELIHOOD; MODEL SELECTION;
D O I
10.1145/3097983.3098091
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We develop a novel method called SParse Optimal Transformations (SPOT) to simultaneously select important variables and explore relationships between the response and predictor variables in high dimensional nonparametric regression analysis. Not only are the optimal transformations identified by SPOT interpretable, they can also be used for response prediction. We further show that SPOT achieves consistency in both variable selection and parameter estimation. Numerical experiments and real data applications demonstrate that SPOT outperforms other existing methods and can serve as an effective tool in practic e.
引用
收藏
页码:857 / 865
页数:9
相关论文
共 35 条
[1]  
[Anonymous], 2010, AISTATS
[2]  
[Anonymous], 2001, PRACTICAL GUIDE SPLI
[3]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[4]  
[Anonymous], 2012, STAT SCI REV J I MAT
[5]  
[Anonymous], 1990, MONOGR STAT APPL PRO, DOI DOI 10.1214/SS/1177013604
[6]  
Balakrishnan Sivaraman, 2012, P 29 INT C MACH LEAR, P911
[7]  
Breiman L, 1996, ANN STAT, V24, P2350
[8]  
BREIMAN L, 1985, J AM STAT ASSOC, V80, P580, DOI 10.2307/2288473
[9]   Nonparametric identification and estimation of transformation models [J].
Chiappori, Pierre-Andre ;
Komunjer, Ivana ;
Kristensen, Dennis .
JOURNAL OF ECONOMETRICS, 2015, 188 (01) :22-39
[10]   Conditional density estimation in a regression setting [J].
Efromovich, Sam .
ANNALS OF STATISTICS, 2007, 35 (06) :2504-2535