Supervised convex clustering

被引:1
|
作者
Wang, Minjie [1 ,7 ]
Yao, Tianyi [2 ]
Allen, Genevera I. [3 ,4 ,5 ,6 ]
机构
[1] Univ Minnesota, Sch Stat, Minneapolis, MN USA
[2] Rice Univ, Dept Stat, Houston, TX USA
[3] Rice Univ, Dept Elect & Comp Engn, Houston, TX USA
[4] Rice Univ, Dept Stat, Houston, TX USA
[5] Rice Univ, Dept Comp Sci, Houston, TX USA
[6] Baylor Coll Med, Jan & Dan Duncan Neurol Res Inst, Houston, TX USA
[7] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
convex clustering; exponential family; generalized linear model deviance; interpretable clustering; supervised clustering; SELECTION; NUMBER; ADMM;
D O I
10.1111/biom.13860
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Clustering has long been a popular unsupervised learning approach to identify groups of similar objects and discover patterns from unlabeled data in many applications. Yet, coming up with meaningful interpretations of the estimated clusters has often been challenging precisely due to their unsupervised nature. Meanwhile, in many real-world scenarios, there are some noisy supervising auxiliary variables, for instance, subjective diagnostic opinions, that are related to the observed heterogeneity of the unlabeled data. By leveraging information from both supervising auxiliary variables and unlabeled data, we seek to uncover more scientifically interpretable group structures that may be hidden by completely unsupervised analyses. In this work, we propose and develop a new statistical pattern discovery method named supervised convex clustering (SCC) that borrows strength from both information sources and guides towards finding more interpretable patterns via a joint convex fusion penalty. We develop several extensions of SCC to integrate different types of supervising auxiliary variables, to adjust for additional covariates, and to find biclusters. We demonstrate the practical advantages of SCC through simulations and a case study on Alzheimer's disease genomics. Specifically, we discover new candidate genes as well as new subtypes of Alzheimer's disease that can potentially lead to better understanding of the underlying genetic mechanisms responsible for the observed heterogeneity of cognitive decline in older adults.
引用
收藏
页码:3846 / 3858
页数:13
相关论文
共 50 条
  • [41] Semi-supervised deep embedded clustering
    Ren, Yazhou
    Hu, Kangrong
    Dai, Xinyi
    Pan, Lili
    Hoi, Steven C. H.
    Xu, Zenglin
    NEUROCOMPUTING, 2019, 325 : 121 - 130
  • [42] A Novel Supervised Clustering Algorithm for Transportation System Applications
    Almannaa, Mohammed H.
    Elhenawy, Mohammed
    Rakha, Hesham A.
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (01) : 222 - 232
  • [43] A Novel Supervised Clustering Based on the Feature Classification Weight
    Zhao, Qi
    Qu, Haitao
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND NATURAL COMPUTING, VOL I, 2009, : 117 - 120
  • [44] A proposal for supervised clustering with Dirichlet Process using labels
    Peralta, Billy
    Caro, Alberto
    Soto, Alvaro
    PATTERN RECOGNITION LETTERS, 2016, 80 : 52 - 57
  • [45] A Discretization Algorithm of Continuous Attributes Based on Supervised Clustering
    Hua, Haiyang
    Zhao, Huaici
    PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 5 - 9
  • [46] A new supervised learning hierarchy clustering classification method
    Pu, Lu Ping
    COMPUTING, CONTROL, INFORMATION AND EDUCATION ENGINEERING, 2015, : 537 - 541
  • [47] An efficient semi-supervised graph based clustering
    Viet-Vu Vu
    INTELLIGENT DATA ANALYSIS, 2018, 22 (02) : 297 - 307
  • [48] A Bayesian model for supervised clustering with the dirichlet process prior
    Daume, H
    Marcu, D
    JOURNAL OF MACHINE LEARNING RESEARCH, 2005, 6 : 1551 - 1577
  • [49] Pruning Training Samples Using a Supervised Clustering Algorithm
    Huang, Minzhang
    Zhao, Hai
    Lu, Bao-Liang
    ADVANCES IN NEURAL NETWORKS - ISNN 2010, PT 2, PROCEEDINGS, 2010, 6064 : 250 - 257
  • [50] Supervised Regression Clustering: A Case Study for Fashion Products
    Tehrani, Ali Fallah
    Ahrens, Diane
    INTERNATIONAL JOURNAL OF BUSINESS ANALYTICS, 2016, 3 (04) : 21 - 40