MCEN: a method of simultaneous variable selection and clustering for high-dimensional multinomial regression

被引:2
|
作者
Ren, Sheng [1 ]
Kang, Emily L. [2 ]
Lu, Jason L. [3 ]
机构
[1] UnitedHlth Grp R&D, Minnetonka, MN USA
[2] Univ Cincinnati, Dept Math Sci, Div Stat & Data Sci, Cincinnati, OH USA
[3] Cincinnati Childrens Hosp Med Ctr, Dept Biomed Informat, Cincinnati, OH 45229 USA
基金
芬兰科学院; 美国国家卫生研究院;
关键词
Classification; Clustering; High dimensional; Multinomial regression; Optimization; Pairwise correlation; SHRINKAGE; REGULARIZATION; CLASSIFICATION; LASSO;
D O I
10.1007/s11222-019-09880-2
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Multinomial regression is often used to investigate the association between potential independent variables and multi-class nominal responses such as multiple disease subtypes. However, it cannot identify groups of variables that have similar effects on predicting the same subtypes of diseases, which is an important problem in biomedical research. Clustering variables in this problem is not trivial, since correlated variables may have distinct predictive effects on the multi-class nominal responses. For example, a group of moderately to highly correlated expressed genes may be associated with different subtypes of a disease. This paper presents a new data-driven simultaneous variable selection and clustering method for high-dimensional multinomial regression. By using a novel penalty function that incorporates both regression coefficients and pairwise correlation to define clusters of variables, the proposed method provides a one-stop solution to select and group important variables associated with different classes of multinomial response at the same time. An alternating minimization algorithm is developed to solve the resulting optimizing problem, which incorporates both convex optimization and clustering steps. The proposed method is compared with the state of the art in terms of prediction and variable clustering performance through extensive simulation studies. In addition, three real data examples are presented to demonstrate how to apply our method and further verify the findings in our simulation studies. The results of simulation and real data studies also shed light on the strength and weakness of several different penalized regression methods with respect to variable clustering and prediction in different scenarios.
引用
收藏
页码:291 / 304
页数:14
相关论文
共 50 条
  • [1] MCEN: a method of simultaneous variable selection and clustering for high-dimensional multinomial regression
    Sheng Ren
    Emily L. Kang
    Jason L. Lu
    Statistics and Computing, 2020, 30 : 291 - 304
  • [2] Bayesian variable selection and model averaging in high-dimensional multinomial nonparametric regression
    Yau, P
    Kohn, R
    Wood, S
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2003, 12 (01) : 23 - 54
  • [3] Simultaneous variable selection and smoothing for high-dimensional function-on-scalar regression
    Parodi, Alice
    Reimherr, Matthew
    ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (02): : 4602 - 4639
  • [4] Sparse Bayesian variable selection in multinomial probit regression model with application to high-dimensional data classification
    Yang Aijun
    Jiang Xuejun
    Xiang Liming
    Lin Jinguan
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (12) : 6137 - 6150
  • [5] Bayesian variable selection in clustering high-dimensional data
    Tadesse, MG
    Sha, N
    Vannucci, M
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (470) : 602 - 617
  • [6] A stepwise regression algorithm for high-dimensional variable selection
    Hwang, Jing-Shiang
    Hu, Tsuey-Hwa
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2015, 85 (09) : 1793 - 1806
  • [7] Variable Selection Diagnostics Measures for High-Dimensional Regression
    Nan, Ying
    Yang, Yuhong
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (03) : 636 - 656
  • [8] High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method
    Dai, Dengluan
    Tang, Anmin
    Ye, Jinli
    MATHEMATICS, 2023, 11 (10)
  • [9] Bayesian variable selection in multinomial probit model for classifying high-dimensional data
    Yang, Aijun
    Li, Yunxian
    Tang, Niansheng
    Lin, Jinguan
    COMPUTATIONAL STATISTICS, 2015, 30 (02) : 399 - 418
  • [10] Bayesian variable selection in multinomial probit model for classifying high-dimensional data
    Aijun Yang
    Yunxian Li
    Niansheng Tang
    Jinguan Lin
    Computational Statistics, 2015, 30 : 399 - 418