Integrating Multidimensional Data for Clustering Analysis With Applications to Cancer Patient Data

被引:11
作者
Park, Seyoung [1 ]
Xu, Hao [2 ]
Zhao, Hongyu [2 ]
机构
[1] Sungkyunkwan Univ, Dept Stat, Seoul, South Korea
[2] Yale Sch Publ Hlth, Dept Biostat, New Haven, CT 06510 USA
基金
新加坡国家研究基金会;
关键词
Gaussian kernel; Multi-omics data; Spectral clustering; PANCREATIC-CANCER; MOLECULAR CLASSIFICATION; GENOMIC ANALYSES; SUBTYPES; BREAST; VISUALIZATION; MUTATIONS; DISCOVERY; ERLOTINIB; EGFR;
D O I
10.1080/01621459.2020.1730853
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Advances in high-throughput genomic technologies coupled with large-scale studies including The Cancer Genome Atlas (TCGA) project have generated rich resources of diverse types of omics data to better understand cancer etiology and treatment responses. Clustering patients into subtypes with similar disease etiologies and/or treatment responses using multiple omics data types has the potential to improve the precision of clustering than using a single data type. However, in practice, patient clustering is still mostly based on a single type of omics data or ad hoc integration of clustering results from individual data types, leading to potential loss of information. By treating each omics data type as a different informative representation from patients, we propose a novel multi-view spectral clustering framework to integrate different omics data types measured from the same subject. We learn the weight of each data type as well as a similarity measure between patients via a nonconvex optimization framework. We solve the proposed nonconvex problem iteratively using the ADMM algorithm and show the convergence of the algorithm. The accuracy and robustness of the proposed clustering method is studied both in theory and through various synthetic data. When our method is applied to the TCGA data, the patient clusters inferred by our method show more significant differences in survival times between clusters than those inferred from existing clustering methods. for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
引用
收藏
页码:14 / 26
页数:13
相关论文
共 45 条
[1]  
[Anonymous], FDN TRENDS MACHINE L, DOI DOI 10.1561/2200000016
[2]   Genomic analyses identify molecular subtypes of pancreatic cancer [J].
Bailey, Peter ;
Chang, David K. ;
Nones, Katia ;
Johns, Amber L. ;
Patch, Ann-Marie ;
Gingras, Marie-Claude ;
Miller, David K. ;
Christ, Angelika N. ;
Bruxner, Tim J. C. ;
Quinn, Michael C. ;
Nourse, Craig ;
Murtaugh, L. Charles ;
Harliwong, Ivon ;
Idrisoglu, Senel ;
Manning, Suzanne ;
Nourbakhsh, Ehsan ;
Wani, Shivangi ;
Fink, Lynn ;
Holmes, Oliver ;
Chin, Vencssa ;
Anderson, Matthew J. ;
Kazakoff, Stephen ;
Leonard, Conrad ;
Newell, Felicity ;
Waddell, Nick ;
Wood, Scott ;
Xu, Qinying ;
Wilson, Peter J. ;
Cloonan, Nicole ;
Kassahn, Karin S. ;
Taylor, Darrin ;
Quek, Kelly ;
Robertson, Alan ;
Pantano, Lorena ;
Mincarelli, Laura ;
Sanchez, Luis N. ;
Evers, Lisa ;
Wu, Jianmin ;
Pinese, Mark ;
Cowley, Mark J. ;
Jones, Marc D. ;
Colvin, Emily K. ;
Nagrial, Adnan M. ;
Humphrey, Emily S. ;
Chantrill, Lorraine A. ;
Mawson, Amanda ;
Humphris, Jeremy ;
Chou, Angela ;
Pajic, Marina ;
Scarlett, Christopher J. .
NATURE, 2016, 531 (7592) :47-+
[3]   PAM50 Breast Cancer Subtyping by RT-qPCR and Concordance with Standard Clinical Molecular Markers [J].
Bastien, Roy R. L. ;
Rodriguez-Lescure, Alvaro ;
Ebbert, Mark T. W. ;
Prat, Aleix ;
Munarriz, Blanca ;
Rowe, Leslie ;
Miller, Patricia ;
Ruiz-Borrego, Manuel ;
Anderson, Daniel ;
Lyons, Bradley ;
Alvarez, Isabel ;
Dowell, Tracy ;
Wall, David ;
Angel Segui, Miguel ;
Barley, Lee ;
Boucher, Kenneth M. ;
Alba, Emilio ;
Pappas, Lisa ;
Davis, Carole A. ;
Aranda, Ignacio ;
Fauron, Christiane ;
Stijleman, Inge J. ;
Palacios, Jose ;
Anton, Antonio ;
Carrasco, Eva ;
Caballero, Rosalia ;
Ellis, Matthew J. ;
Nielsen, Torsten O. ;
Perou, Charles M. ;
Astill, Mark ;
Bernard, Philip S. ;
Martin, Miguel .
BMC MEDICAL GENOMICS, 2012, 5
[4]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[5]   The Cancer Genome Atlas Pan-Cancer analysis project [J].
Weinstein, John N. ;
Collisson, Eric A. ;
Mills, Gordon B. ;
Shaw, Kenna R. Mills ;
Ozenberger, Brad A. ;
Ellrott, Kyle ;
Shmulevich, Ilya ;
Sander, Chris ;
Stuart, Joshua M. .
NATURE GENETICS, 2013, 45 (10) :1113-1120
[6]   Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma [J].
Aguirre, Andrew J. ;
Hruban, Ralph H. ;
Raphael, Benjamin J. .
CANCER CELL, 2017, 32 (02) :185-+
[7]   Biclustering with heterogeneous variance [J].
Chen, Guanhua ;
Sullivan, Patrick F. ;
Kosorok, Michael R. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (30) :12253-12258
[8]   Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes [J].
Cristescu, Razvan ;
Lee, Jeeyun ;
Nebozhyn, Michael ;
Kim, Kyoung-Mee ;
Ting, Jason C. ;
Wong, Swee Seong ;
Liu, Jiangang ;
Yue, Yong Gang ;
Wang, Jian ;
Yu, Kun ;
Ye, Xiang S. ;
Do, In-Gu ;
Liu, Shawn ;
Gong, Lara ;
Fu, Jake ;
Jin, Jason Gang ;
Choi, Min Gew ;
Sohn, Tae Sung ;
Lee, Joon Ho ;
Bae, Jae Moon ;
Kim, Seung Tae ;
Park, Se Hoon ;
Sohn, Insuk ;
Jung, Sin-Ho ;
Tan, Patrick ;
Chen, Ronghua ;
Hardwick, James ;
Kang, Won Ki ;
Ayers, Mark ;
Dai Hongyue ;
Reinhard, Christoph ;
Loboda, Andrey ;
Kim, Sung ;
Aggarwal, Amit .
NATURE MEDICINE, 2015, 21 (05) :449-U217
[9]  
Dattorro J., 2005, Convex Optimization Euclidean Distance Geometry
[10]  
Gabay D., 1976, Computers & Mathematics with Applications, V2, P17, DOI 10.1016/0898-1221(76)90003-1