Integrative clustering of high-dimensional data with joint and individual clusters

被引：21

作者：

Hellton, Kristoffer H. ^{[1
,2
]}

Thoresen, Magne ^{[1
]}

机构：

[1] Univ Oslo, Dept Biostat, Oslo Ctr Biostat & Epidemiol, N-0317 Oslo, Norway

[2] Univ Oslo, Inst Clin Med, Div Med & Lab Sci, N-1478 Lorenskog, Norway

来源：

BIOSTATISTICS | 2016年 / 17卷 / 03期

关键词：

Clustering; Integrative genomics; Principal component analysis; Singular value decomposition; BREAST; MODEL;

D O I：

10.1093/biostatistics/kxw005

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

When measuring a range of genomic, epigenomic, and transcriptomic variables for the same tissue sample, an integrative approach to analysis can strengthen inference and lead to new insights. This is also the case when clustering patient samples, and several integrative cluster procedures have been proposed. Common for these methodologies is the restriction to a joint cluster structure, equal in all data layers. We instead present a clustering extension of the Joint and Individual Variance Explained algorithm (JIVE), Joint and Individual Clustering (JIC), enabling the construction of both joint and data type-specific clusters simultaneously. The procedure builds on the connection between k-means clustering and principal component analysis, and hence, the number of clusters can be determined by the number of relevant principal components. The proposed procedure is compared with iCluster, a method restricted to only joint clusters, and simulations show that JIC is clearly advantageous when both individual and joint clusters are present. The procedure is illustrated using gene expression and miRNA levels measured in breast cancer tissue from The Cancer Genome Atlas. The analysis suggests a division into three joint clusters common for both data types and two expression-specific clusters.

引用

页码：537 / 548

页数：12

共 50 条

[21] Clusterability and Clustering of Images and Other "Real" High-Dimensional Data
Yellamraju, Tarun
Boutin, Mireille
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (04) : 1927 - 1938
[22] Density-connected subspace clustering for high-dimensional data
Kailing, K
Kriegel, HP
Kröger, P
PROCEEDINGS OF THE FOURTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2004, : 246 - 256
[23] Enhanced synchronization-inspired clustering for high-dimensional data
Chen, Lei
Guo, Qinghua
Liu, Zhaohua
Zhang, Shiwen
Zhang, Hongqiang
COMPLEX & INTELLIGENT SYSTEMS, 2021, 7 (01) : 203 - 223
[24] A sparse factor model for clustering high-dimensional longitudinal data
Lu, Zihang
Chandra, Noirrit Kiran
STATISTICS IN MEDICINE, 2024, 43 (19) : 3633 - 3648
[25] Discriminative Clustering of High-Dimensional Data Using Generative Modeling
Abdi, Masoud
Lim, Chee Peng
Mohamed, Shady
Abbasnejad, Saeid Nahavandi Ehsan
Van Den Hengel, Anton
2018 IEEE 61ST INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2018, : 799 - 802
[26] Clustering Lines in High-Dimensional Space: Classification of Incomplete Data
Gao, Jie
Langberg, Michael
Schulman, Leonard J.
ACM TRANSACTIONS ON ALGORITHMS, 2010, 7 (01)
[27] MixDir: Scalable Bayesian Clustering for High-Dimensional Categorical Data
Ahlmann-Eltze, Constantin
Yau, Christopher
2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, : 526 - 539
[28] Enhanced synchronization-inspired clustering for high-dimensional data
Lei Chen
Qinghua Guo
Zhaohua Liu
Shiwen Zhang
Hongqiang Zhang
Complex & Intelligent Systems, 2021, 7 : 203 - 223
[29] Optimal variable clustering for high-dimensional matrix valued data
Lee, Inbeom
Deng, Siyi
Ning, Yang
INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2025, 14 (01)
[30] Persistent homology based clustering algorithm for high-dimensional data
Xiong Z.
Wei Y.
Xiong Z.
He K.
Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2024, 52 (02): : 29 - 35

← 1 2 3 4 5 →