Integrative clustering of high-dimensional data with joint and individual clusters

被引：21

作者：

Hellton, Kristoffer H. ^{[1
,2
]}

Thoresen, Magne ^{[1
]}

机构：

[1] Univ Oslo, Dept Biostat, Oslo Ctr Biostat & Epidemiol, N-0317 Oslo, Norway

[2] Univ Oslo, Inst Clin Med, Div Med & Lab Sci, N-1478 Lorenskog, Norway

来源：

BIOSTATISTICS | 2016年 / 17卷 / 03期

关键词：

Clustering; Integrative genomics; Principal component analysis; Singular value decomposition; BREAST; MODEL;

D O I：

10.1093/biostatistics/kxw005

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

When measuring a range of genomic, epigenomic, and transcriptomic variables for the same tissue sample, an integrative approach to analysis can strengthen inference and lead to new insights. This is also the case when clustering patient samples, and several integrative cluster procedures have been proposed. Common for these methodologies is the restriction to a joint cluster structure, equal in all data layers. We instead present a clustering extension of the Joint and Individual Variance Explained algorithm (JIVE), Joint and Individual Clustering (JIC), enabling the construction of both joint and data type-specific clusters simultaneously. The procedure builds on the connection between k-means clustering and principal component analysis, and hence, the number of clusters can be determined by the number of relevant principal components. The proposed procedure is compared with iCluster, a method restricted to only joint clusters, and simulations show that JIC is clearly advantageous when both individual and joint clusters are present. The procedure is illustrated using gene expression and miRNA levels measured in breast cancer tissue from The Cancer Genome Atlas. The analysis suggests a division into three joint clusters common for both data types and two expression-specific clusters.

引用

页码：537 / 548

页数：12

共 50 条

[1] Integrative clustering methods for high-dimensional molecular data
Chalise, Prabhakar
Koestler, Devin C.
Bimali, Milan
Yu, Qing
Fridley, Brooke L.
TRANSLATIONAL CANCER RESEARCH, 2014, 3 (03) : 202 - 216
[2] The Role of Hubness in Clustering High-Dimensional Data
Tomasev, Nenad
Radovanovic, Milos
Mladenic, Dunja
Ivanovic, Mirjana
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (03) : 739 - 751
[3] Clustering High-Dimensional Noisy Categorical Data
Tian, Zhiyi
Xu, Jiaming
Tang, Jen
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 3008 - 3019
[4] Clustering of imbalanced high-dimensional media data
Šárka Brodinová
Maia Zaharieva
Peter Filzmoser
Thomas Ortner
Christian Breiteneder
Advances in Data Analysis and Classification, 2018, 12 : 261 - 284
[5] Flexible clustering of high-dimensional data via mixtures of joint generalized hyperbolic distributions
Tang, Yang
Browne, Ryan R.
McNicholas, Paul D.
STAT, 2018, 7 (01):
[6] Clustering of imbalanced high-dimensional media data
Brodinova, Sarka
Zaharieva, Maia
Filzmoser, Peter
Ortner, Thomas
Breiteneder, Christian
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2018, 12 (02) : 261 - 284
[7] Parameter-wise co-clustering for high-dimensional data
Gallaugher, M. P. B.
Biernacki, C.
McNicholas, P. D.
COMPUTATIONAL STATISTICS, 2023, 38 (03) : 1597 - 1619
[8] Fuzzy nearest neighbor clustering of high-dimensional data
Wang, HB
Yu, YQ
Zhou, DR
Meng, B
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2569 - 2572
[9] Model based clustering of high-dimensional binary data
Tang, Yang
Browne, Ryan P.
Mc Nicholas, Paul D.
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 87 : 84 - 101
[10] Joint image clustering and feature selection with auto-adjoined learning for high-dimensional data
Wang, Xiaodong
Wu, Pengtao
Xu, Qinghua
Zeng, Zhiqiang
Xie, Yong
KNOWLEDGE-BASED SYSTEMS, 2021, 232

← 1 2 3 4 5 →