Integrative clustering of high-dimensional data with joint and individual clusters

被引:21
|
作者
Hellton, Kristoffer H. [1 ,2 ]
Thoresen, Magne [1 ]
机构
[1] Univ Oslo, Dept Biostat, Oslo Ctr Biostat & Epidemiol, N-0317 Oslo, Norway
[2] Univ Oslo, Inst Clin Med, Div Med & Lab Sci, N-1478 Lorenskog, Norway
关键词
Clustering; Integrative genomics; Principal component analysis; Singular value decomposition; BREAST; MODEL;
D O I
10.1093/biostatistics/kxw005
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
When measuring a range of genomic, epigenomic, and transcriptomic variables for the same tissue sample, an integrative approach to analysis can strengthen inference and lead to new insights. This is also the case when clustering patient samples, and several integrative cluster procedures have been proposed. Common for these methodologies is the restriction to a joint cluster structure, equal in all data layers. We instead present a clustering extension of the Joint and Individual Variance Explained algorithm (JIVE), Joint and Individual Clustering (JIC), enabling the construction of both joint and data type-specific clusters simultaneously. The procedure builds on the connection between k-means clustering and principal component analysis, and hence, the number of clusters can be determined by the number of relevant principal components. The proposed procedure is compared with iCluster, a method restricted to only joint clusters, and simulations show that JIC is clearly advantageous when both individual and joint clusters are present. The procedure is illustrated using gene expression and miRNA levels measured in breast cancer tissue from The Cancer Genome Atlas. The analysis suggests a division into three joint clusters common for both data types and two expression-specific clusters.
引用
收藏
页码:537 / 548
页数:12
相关论文
共 50 条
  • [31] Clustering High-Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering
    Kriegel, Hans-Peter
    Kroeger, Peer
    Zimek, Arthur
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (01)
  • [32] An intelligent clustering algorithm for high-dimensional multiview data in big data applications
    Tao, Qian
    Gu, Chunqin
    Wang, Zhenyu
    Jiang, Daoning
    NEUROCOMPUTING, 2020, 393 : 234 - 244
  • [33] A joint estimation for the high-dimensional regression modeling on stratified data
    Gao, Yimiao
    Yang, Yuehan
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023, 52 (12) : 6129 - 6140
  • [34] Clustering High-Dimensional Data: A Reduction-Level Fusion of PCA and Random Projection
    Pasunuri, Raghunadh
    Venkaiah, Vadlamudi China
    Srivastava, Amit
    RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 479 - 487
  • [35] Local-Density Subspace Distributed Clustering for High-Dimensional Data
    Geng, Yangli-ao
    Li, Qingyong
    Liang, Mingfei
    Chi, Chong-Yung
    Tan, Juan
    Huang, Heng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (08) : 1799 - 1814
  • [36] Accelerating Density-Based Subspace Clustering in High-Dimensional Data
    Prinzbach, Juergen
    Lauer, Tobias
    Kiefer, Nicolas
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 474 - 481
  • [37] A GA-based Feature Selection for High-dimensional Data Clustering
    Sun, Mei
    Xiong, Langhuan
    Sun, Haojun
    Jiang, Dazhi
    THIRD INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING, 2009, : 769 - 772
  • [38] A PROBABILISTIC l1 METHOD FOR CLUSTERING HIGH-DIMENSIONAL DATA
    Asamov, Tsvetan
    Ben-Israel, Adi
    PROBABILITY IN THE ENGINEERING AND INFORMATIONAL SCIENCES, 2022, 36 (02) : 433 - 448
  • [39] HDG-Tree: A Structure for Clustering High-Dimensional Data Streams
    Ren, Jiadong
    Li, Lining
    Xia, Yan
    Ren, Jiadong
    2009 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL 2, PROCEEDINGS, 2009, : 594 - +
  • [40] Subspace Clustering in High-Dimensional Data Streams: A Systematic Literature Review
    Ghani, Nur Laila Ab
    Aziz, Izzatdin Abdul
    AbdulKadir, Said Jadid
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 4649 - 4668