Integrative clustering of high-dimensional data with joint and individual clusters

被引:21
|
作者
Hellton, Kristoffer H. [1 ,2 ]
Thoresen, Magne [1 ]
机构
[1] Univ Oslo, Dept Biostat, Oslo Ctr Biostat & Epidemiol, N-0317 Oslo, Norway
[2] Univ Oslo, Inst Clin Med, Div Med & Lab Sci, N-1478 Lorenskog, Norway
关键词
Clustering; Integrative genomics; Principal component analysis; Singular value decomposition; BREAST; MODEL;
D O I
10.1093/biostatistics/kxw005
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
When measuring a range of genomic, epigenomic, and transcriptomic variables for the same tissue sample, an integrative approach to analysis can strengthen inference and lead to new insights. This is also the case when clustering patient samples, and several integrative cluster procedures have been proposed. Common for these methodologies is the restriction to a joint cluster structure, equal in all data layers. We instead present a clustering extension of the Joint and Individual Variance Explained algorithm (JIVE), Joint and Individual Clustering (JIC), enabling the construction of both joint and data type-specific clusters simultaneously. The procedure builds on the connection between k-means clustering and principal component analysis, and hence, the number of clusters can be determined by the number of relevant principal components. The proposed procedure is compared with iCluster, a method restricted to only joint clusters, and simulations show that JIC is clearly advantageous when both individual and joint clusters are present. The procedure is illustrated using gene expression and miRNA levels measured in breast cancer tissue from The Cancer Genome Atlas. The analysis suggests a division into three joint clusters common for both data types and two expression-specific clusters.
引用
收藏
页码:537 / 548
页数:12
相关论文
共 50 条
  • [21] Clusterability and Clustering of Images and Other "Real" High-Dimensional Data
    Yellamraju, Tarun
    Boutin, Mireille
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (04) : 1927 - 1938
  • [22] Density-connected subspace clustering for high-dimensional data
    Kailing, K
    Kriegel, HP
    Kröger, P
    PROCEEDINGS OF THE FOURTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2004, : 246 - 256
  • [23] Enhanced synchronization-inspired clustering for high-dimensional data
    Chen, Lei
    Guo, Qinghua
    Liu, Zhaohua
    Zhang, Shiwen
    Zhang, Hongqiang
    COMPLEX & INTELLIGENT SYSTEMS, 2021, 7 (01) : 203 - 223
  • [24] A sparse factor model for clustering high-dimensional longitudinal data
    Lu, Zihang
    Chandra, Noirrit Kiran
    STATISTICS IN MEDICINE, 2024, 43 (19) : 3633 - 3648
  • [25] Discriminative Clustering of High-Dimensional Data Using Generative Modeling
    Abdi, Masoud
    Lim, Chee Peng
    Mohamed, Shady
    Abbasnejad, Saeid Nahavandi Ehsan
    Van Den Hengel, Anton
    2018 IEEE 61ST INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2018, : 799 - 802
  • [26] Clustering Lines in High-Dimensional Space: Classification of Incomplete Data
    Gao, Jie
    Langberg, Michael
    Schulman, Leonard J.
    ACM TRANSACTIONS ON ALGORITHMS, 2010, 7 (01)
  • [27] MixDir: Scalable Bayesian Clustering for High-Dimensional Categorical Data
    Ahlmann-Eltze, Constantin
    Yau, Christopher
    2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, : 526 - 539
  • [28] Enhanced synchronization-inspired clustering for high-dimensional data
    Lei Chen
    Qinghua Guo
    Zhaohua Liu
    Shiwen Zhang
    Hongqiang Zhang
    Complex & Intelligent Systems, 2021, 7 : 203 - 223
  • [29] Optimal variable clustering for high-dimensional matrix valued data
    Lee, Inbeom
    Deng, Siyi
    Ning, Yang
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2025, 14 (01)
  • [30] Persistent homology based clustering algorithm for high-dimensional data
    Xiong Z.
    Wei Y.
    Xiong Z.
    He K.
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2024, 52 (02): : 29 - 35