Assisted gene expression-based clustering with AWNCut

被引:7
作者
Li, Yang [1 ,2 ]
Bie, Ruofan [2 ]
Hidalgo, Sebastian J. Teran [5 ]
Qin, Yichen [4 ]
Wu, Mengyun [3 ,5 ]
Ma, Shuangge [2 ,5 ]
机构
[1] Renmin Univ China, Ctr Appl Stat, Beijing, Peoples R China
[2] Renmin Univ China, Sch Stat, Beijing 100872, Peoples R China
[3] Shanghai Univ Finance & Econ, Sch Stat & Management, Shanghai 200433, Peoples R China
[4] Univ Cincinnati, Dept Operat Business Analyt & Informat Sys, Cincinnati, OH USA
[5] Yale Univ, Dept Biostat, New Haven, CT 06520 USA
基金
美国国家卫生研究院; 中国国家自然科学基金;
关键词
assisted analysis; clustering; gene expression data; NCut; HIGH-DIMENSIONAL DATA; K-MEANS; FEATURE-SELECTION; OMICS DATA; CANCER; SUBTYPE; REGRESSION; ALGORITHM; DISCOVERY; DISEASE;
D O I
10.1002/sim.7928
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the research on complex diseases, gene expression (GE) data have been extensively used for clustering samples. The clusters so generated can serve as the basis for disease subtype identification, risk stratification, and many other purposes. With the small sample sizes of genetic profiling studies and noisy nature of GE data, clustering analysis results are often unsatisfactory. In the most recent studies, a prominent trend is to conduct multidimensional profiling, which collects data on GEs and their regulators (copy number alterations, microRNAs, methylation, etc.) on the same subjects. With the regulation relationships, regulators contain important information on the properties of GEs. We develop a novel assisted clustering method, which effectively uses regulator information to improve clustering analysis using GE data. To account for the fact that not all GEs are informative, we propose a weighted strategy, where the weights are determined data-dependently and can discriminate informative GEs from noises. The proposed method is built on the NCut technique and effectively realized using a simulated annealing algorithm. Simulations demonstrate that it can well outperform multiple direct competitors. In the analysis of TCGA cutaneous melanoma and lung adenocarcinoma data, biologically sensible findings different from the alternatives are made.
引用
收藏
页码:4386 / 4403
页数:18
相关论文
共 45 条
[1]   Towards clinically more relevant dissection of patient heterogeneity via survival-based Bayesian clustering [J].
Ahmad, Ashar ;
Froehlich, Holger .
BIOINFORMATICS, 2017, 33 (22) :3558-3566
[2]   Genomic Classification of Cutaneous Melanoma [J].
Akbani, Rehan ;
Akdemir, Kadir C. ;
Aksoy, B. Arman ;
Albert, Monique ;
Ally, Adrian ;
Amin, Samirkumar B. ;
Arachchi, Harindra ;
Arora, Arshi ;
Auman, J. Todd ;
Ayala, Brenda ;
Baboud, Julien ;
Balasundaram, Miruna ;
Balu, Saianand ;
Barnabas, Nandita ;
Bartlett, John ;
Bartlett, Pam ;
Bastian, Boris C. ;
Baylin, Stephen B. ;
Behera, Madhusmita ;
Belyaev, Dmitry ;
Benz, Christopher ;
Bernard, Brady ;
Beroukhim, Rameen ;
Bir, Natalie ;
Black, Aaron D. ;
Bodenheimer, Tom ;
Boice, Lori ;
Boland, Genevieve M. ;
Bono, Riccardo ;
Bootwalla, Moiz S. ;
Bosenberg, Marcus ;
Bowen, Jay ;
Bowlby, Reanne ;
Bristow, Christopher A. ;
Brockway-Lunardi, Laura ;
Brooks, Denise ;
Brzezinski, Jakub ;
Bshara, Wiam ;
Buda, Elizabeth ;
Burns, William R. ;
Butterfield, Yaron S. N. ;
Button, Michael ;
Calderone, Tiffany ;
Cappellini, Giancarlo Antonini ;
Carter, Candace ;
Carter, Scott L. ;
Cherney, Lynn ;
Cherniack, Andrew D. ;
Chevalier, Aaron ;
Chin, Lynda .
CELL, 2015, 161 (07) :1681-1696
[3]   SIFORM: shared informative factor models for integration of multi-platform bioinformatic data [J].
An, Xuebei ;
Hu, Jianhua ;
Do, Kim-Anh .
BIOINFORMATICS, 2016, 32 (21) :3279-3290
[4]   A simple approach to sparse clustering [J].
Arias-Castro, Ery ;
Pu, Xiao .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2017, 105 :217-228
[5]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[6]   SIMULATED ANNEALING [J].
BERTSIMAS, D ;
TSITSIKLIS, J .
STATISTICAL SCIENCE, 1993, 8 (01) :10-15
[7]   Model-based clustering of high-dimensional data: A review [J].
Bouveyron, Charles ;
Brunet-Saumard, Camille .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 :52-78
[8]   COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION [J].
Breheny, Patrick ;
Huang, Jian .
ANNALS OF APPLIED STATISTICS, 2011, 5 (01) :232-253
[9]   Analysis of cancer gene expression data with an assisted robust marker identification approach [J].
Chai, Hao ;
Shi, Xingjie ;
Zhang, Qingzhao ;
Zhao, Qing ;
Huang, Yuan ;
Ma, Shuangge .
GENETIC EPIDEMIOLOGY, 2017, 41 (08) :779-789
[10]   Integrative clustering methods for high-dimensional molecular data [J].
Chalise, Prabhakar ;
Koestler, Devin C. ;
Bimali, Milan ;
Yu, Qing ;
Fridley, Brooke L. .
TRANSLATIONAL CANCER RESEARCH, 2014, 3 (03) :202-216