A NEW SEMIPARAMETRIC APPROACH TO FINITE MIXTURE OF REGRESSIONS USING PENALIZED REGRESSION VIA FUSION

被引:5
作者
Austin, Erin [1 ]
Pan, Wei [2 ]
Shen, Xiaotong [3 ]
机构
[1] Univ Colorado Denver, Dept Math & Stat Sci, Denver, CO 80204 USA
[2] Univ Minnesota, Sch Publ Hlth, Div Biostat, Minneapolis, MN 55455 USA
[3] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
关键词
FMR; group LASSO; group TLP; grouping pursuit; penalized regression; semiparametric; SELECTION; LIKELIHOOD;
D O I
10.5705/ss.202016.0531
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
For some modeling problems a population may be better assessed as an aggregate of unknown subpopulations, each with a distinct relationship between a response and associated variables. The finite mixture of regressions (FMR) model, in which an outcome is derived from one of a finite number of linear regression models, is a natural tool in this setting. In this article, we first propose a new penalized regression approach. Then, we demonstrate how the proposed approach better identifies subpopulations and their corresponding models than a semiparametric FMR method does. Our new method fits models for each person via grouping pursuit, utilizing a new group-truncated L-1 penalty that shrinks the differences between estimated parameter vectors. The methodology causes the individuals' models to cluster into a few common models, in turn revealing previously unknown subpopulations. In fact, by varying the penalty strength, the new method can reveal a hierarchical structure among the subpopulations that can be useful in exploratory analyses. Simulations using FMR models and a real-data analysis show that the method performs promisingly well.
引用
收藏
页码:783 / 807
页数:25
相关论文
共 26 条
[1]  
[Anonymous], FOUND TRENDS MACH LE
[2]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[3]  
[Anonymous], 2009, CMA SYNTHESIS MICROA
[4]  
Benaglia T, 2009, J STAT SOFTW, V32, P1
[5]   An EM-Like Algorithm for Semi- and Nonparametric Estimation in Multivariate Mixtures [J].
Benaglia, Tatiana ;
Chauveau, Didier ;
Hunter, David R. .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2009, 18 (02) :505-526
[6]   Splitting Methods for Convex Clustering [J].
Chi, Eric C. ;
Lange, Kenneth .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2015, 24 (04) :994-1013
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]  
Gaffney S. J., 2003, P 9 INT WORKSH ART I
[9]   GENERALIZED CROSS-VALIDATION AS A METHOD FOR CHOOSING A GOOD RIDGE PARAMETER [J].
GOLUB, GH ;
HEATH, M ;
WAHBA, G .
TECHNOMETRICS, 1979, 21 (02) :215-223
[10]   Semiparametric mixtures of regressions [J].
Hunter, David R. ;
Young, Derek S. .
JOURNAL OF NONPARAMETRIC STATISTICS, 2012, 24 (01) :19-38