Supervised convex clustering

被引:1
|
作者
Wang, Minjie [1 ,7 ]
Yao, Tianyi [2 ]
Allen, Genevera I. [3 ,4 ,5 ,6 ]
机构
[1] Univ Minnesota, Sch Stat, Minneapolis, MN USA
[2] Rice Univ, Dept Stat, Houston, TX USA
[3] Rice Univ, Dept Elect & Comp Engn, Houston, TX USA
[4] Rice Univ, Dept Stat, Houston, TX USA
[5] Rice Univ, Dept Comp Sci, Houston, TX USA
[6] Baylor Coll Med, Jan & Dan Duncan Neurol Res Inst, Houston, TX USA
[7] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
convex clustering; exponential family; generalized linear model deviance; interpretable clustering; supervised clustering; SELECTION; NUMBER; ADMM;
D O I
10.1111/biom.13860
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Clustering has long been a popular unsupervised learning approach to identify groups of similar objects and discover patterns from unlabeled data in many applications. Yet, coming up with meaningful interpretations of the estimated clusters has often been challenging precisely due to their unsupervised nature. Meanwhile, in many real-world scenarios, there are some noisy supervising auxiliary variables, for instance, subjective diagnostic opinions, that are related to the observed heterogeneity of the unlabeled data. By leveraging information from both supervising auxiliary variables and unlabeled data, we seek to uncover more scientifically interpretable group structures that may be hidden by completely unsupervised analyses. In this work, we propose and develop a new statistical pattern discovery method named supervised convex clustering (SCC) that borrows strength from both information sources and guides towards finding more interpretable patterns via a joint convex fusion penalty. We develop several extensions of SCC to integrate different types of supervising auxiliary variables, to adjust for additional covariates, and to find biclusters. We demonstrate the practical advantages of SCC through simulations and a case study on Alzheimer's disease genomics. Specifically, we discover new candidate genes as well as new subtypes of Alzheimer's disease that can potentially lead to better understanding of the underlying genetic mechanisms responsible for the observed heterogeneity of cognitive decline in older adults.
引用
收藏
页码:3846 / 3858
页数:13
相关论文
共 50 条
  • [1] Sparse Convex Clustering
    Wang, Binhuan
    Zhang, Yilong
    Sun, Will Wei
    Fang, Yixin
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2018, 27 (02) : 393 - 403
  • [2] CONVEX CLUSTERING FOR AUTOCORRELATED TIME SERIES
    Revay, Max
    Solo, Victor
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3313 - 3317
  • [3] SPLITTING METHODS FOR CONVEX BI-CLUSTERING AND CO-CLUSTERING
    Weylandt, Michael
    2019 IEEE DATA SCIENCE WORKSHOP (DSW), 2019, : 237 - 242
  • [5] Supervised box clustering
    Spinelli, Vincenzo
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2017, 11 (01) : 179 - 204
  • [6] Supervised box clustering
    Vincenzo Spinelli
    Advances in Data Analysis and Classification, 2017, 11 : 179 - 204
  • [7] Robust convex clustering
    Quan, Zhenzhen
    Chen, Songcan
    SOFT COMPUTING, 2020, 24 (02) : 731 - 744
  • [8] Robust convex clustering
    Zhenzhen Quan
    Songcan Chen
    Soft Computing, 2020, 24 : 731 - 744
  • [9] Ensemble classification based on supervised clustering for credit scoring
    Xiao, Hongshan
    Xiao, Zhi
    Wang, Yu
    APPLIED SOFT COMPUTING, 2016, 43 : 73 - 86
  • [10] Supervised Adaptive Incremental Clustering for data stream of chunks
    Zheng, Laiwen
    Huo, Hong
    Guo, Yiyou
    Fang, Tao
    NEUROCOMPUTING, 2017, 219 : 502 - 517