Supervised clustering of high-dimensional data using regularized mixture modeling

被引:5
|
作者
Chang, Wennan [1 ]
Wan, Changlin [1 ]
Zang, Yong [2 ,3 ]
Zhang, Chi [3 ,4 ]
Cao, Sha [2 ,3 ]
机构
[1] Purdue Univ, Dept Elect & Comp Engn, W Lafayette, IN 47907 USA
[2] Indiana Univ Sch Med, Dept Biostat, Indianapolis, IN 46202 USA
[3] Indiana Univ Sch Med, Ctr Computat Biol & Bioinformat, Indianapolis, IN 46202 USA
[4] Indiana Univ Sch Med, Dept Med & Mol Genet, Indianapolis, IN 46202 USA
基金
美国国家科学基金会;
关键词
supervised learning; mixture modeling; disease heterogeneity; VARIABLE SELECTION; FINITE MIXTURE; EM-ALGORITHM; REGRESSION; LIKELIHOOD; L(1)-PENALIZATION; RESOURCE; LASSO;
D O I
10.1093/bib/bbaa291
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Identifying relationships between genetic variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high-dimensional genetic manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects.We proposed a novel supervised clustering algorithm using penalized mixture regression model, called component-wise sparse mixture regression (CSMR), to deal with the challenges in studying the heterogeneous relationships between high-dimensional genetic features and a phenotype. The algorithm was adapted from the classification expectation maximization algorithm, which offers a novel supervised solution to the clustering problem, with substantial improvement on both the computational efficiency and biological interpretability. Experimental evaluation on simulated benchmark datasets demonstrated that the CSMR can accurately identify the subspaces on which subset of features are explanatory to the response variables, and it outperformed the baseline methods. Application of CSMR on a drug sensitivity dataset again demonstrated the superior performance of CSMR over the others, where CSMR is powerful in recapitulating the distinct subgroups hidden in the pool of cell lines with regards to their coping mechanisms to different drugs. CSMR represents a big data analysis tool with the potential to resolve the complexity of translating the clinical representations of the disease to the real causes underpinning it. We believe that it will bring new understanding to the molecular basis of a disease and could be of special relevance in the growing field of personalized medicine.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Regularized Gaussian Mixture Model for High-Dimensional Clustering
    Zhao, Yang
    Shrivastava, Abhishek K.
    Tsui, Kwok Leung
    IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (10) : 3677 - 3688
  • [2] Discriminative Clustering of High-Dimensional Data Using Generative Modeling
    Abdi, Masoud
    Lim, Chee Peng
    Mohamed, Shady
    Abbasnejad, Saeid Nahavandi Ehsan
    Van Den Hengel, Anton
    2018 IEEE 61ST INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2018, : 799 - 802
  • [3] Clustering of High-Dimensional Data via Finite Mixture Models
    McLachlan, Geoff J.
    Baek, Jangsun
    ADVANCES IN DATA ANALYSIS, DATA HANDLING AND BUSINESS INTELLIGENCE, 2010, : 33 - +
  • [4] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
  • [5] Clustering High-Dimensional Data
    Masulli, Francesco
    Rovetta, Stefano
    CLUSTERING HIGH-DIMENSIONAL DATA, CHDD 2012, 2015, 7627 : 1 - 13
  • [6] An entropy weighting mixture model for subspace clustering of high-dimensional data
    Peng, Liuqing
    Zhang, Junying
    PATTERN RECOGNITION LETTERS, 2011, 32 (08) : 1154 - 1161
  • [7] Clustering high-dimensional data using growing SOM
    Zhou, JL
    Fu, Y
    ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 2, PROCEEDINGS, 2005, 3497 : 63 - 68
  • [8] Regularized k-means clustering of high-dimensional data and its asymptotic consistency
    Sun, Wei
    Wang, Junhui
    Fang, Yixin
    ELECTRONIC JOURNAL OF STATISTICS, 2012, 6 : 148 - 167
  • [9] Clustering electricity consumers using high-dimensional regression mixture models
    Devijver, Emilie
    Goude, Yannig
    Poggi, Jean-Michel
    APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2020, 36 (01) : 159 - 177
  • [10] Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models
    Ruan, Lingyan
    Yuan, Ming
    Zou, Hui
    NEURAL COMPUTATION, 2011, 23 (06) : 1605 - 1622