Discovering Subgroups of Patients from DNA Copy Number Data Using NMF on Compacted Matrices

被引:7
作者
de Campos, Cassio P. [1 ,2 ]
Rancoita, Paola M. V. [1 ,2 ,3 ]
Kwee, Ivo [1 ,2 ,4 ]
Zucca, Emanuele [5 ]
Zaffalon, Marco [1 ]
Bertoni, Francesco [2 ,5 ]
机构
[1] Dalle Molle Inst Artificial Intelligence IDSIA, Manno, Switzerland
[2] Oncol Res Inst, Lymphoma & Genom Res Program, Bellinzona, Switzerland
[3] Univ Vita Salute San Raffaele, Univ Ctr Stat Biomed Sci CUSSB, Milan, Italy
[4] Swiss Inst Bioinformat, Lausanne, Switzerland
[5] Oncol Inst Southern Switzerland IOSI, Lymphoma Unit, Bellinzona, Switzerland
基金
瑞士国家科学基金会;
关键词
B-CELL LYMPHOMA; MEDULLOBLASTOMA; FACTORIZATION; PROFILES;
D O I
10.1371/journal.pone.0079720
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases.
引用
收藏
页数:12
相关论文
共 33 条
[1]  
[Anonymous], 2003, P 26 ANN INT ACM SIG, DOI DOI 10.1145/860435.860485
[2]   Progenetix.net: an online repository for molecular cytogenetic aberration data [J].
Baudis, M ;
Cleary, ML .
BIOINFORMATICS, 2001, 17 (12) :1228-1229
[3]  
Beroukhim R, 2006, PLOS COMPUT BIOL, V2, P323, DOI 10.1371/journal.pcbi.0020041
[4]   Non-negative matrix factorization to perform unsupervised clustering of genome wide DNA profiles in mature B cell lymphoid neoplasms [J].
Chigrinova, Ekaterina ;
Kwee, Ivo ;
Rinaldi, Andrea ;
Poretti, Giulia ;
Pruneri, Giancarlo ;
Neri, Antonino ;
Gaidano, Gianluca ;
Ponzoni, Maurilio ;
Zucca, Emanuele ;
Bertoni, Francesco .
BRITISH JOURNAL OF HAEMATOLOGY, 2010, 150 (02) :229-232
[5]  
Corani G, 2011, CORR
[6]   CLUSTER SEPARATION MEASURE [J].
DAVIES, DL ;
BOULDIN, DW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) :224-227
[7]   Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology [J].
Devarajan, Karthik .
PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (07)
[8]  
Gaussier E., 2005, SIGIR 2005. Proceedings of the Twenty-Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P601, DOI 10.1145/1076034.1076148
[9]   Genetic Alterations and Oncogenic Pathways Associated with Breast Cancer Subtypes [J].
Hu, Xiaolan ;
Stern, Howard M. ;
Ge, Lin ;
O'Brien, Carol ;
Haydu, Lauren ;
Honchell, Cynthia D. ;
Haverty, Peter M. ;
Peters, Brock A. ;
Wu, Thomas D. ;
Amler, Lukas C. ;
Chant, John ;
Stokoe, David ;
Lackner, Mark R. ;
Cavet, Guy .
MOLECULAR CANCER RESEARCH, 2009, 7 (04) :511-522
[10]   APPROXIMATE EVALUATION TECHNIQUES FOR SINGLE-LINK AND COMPLETE-LINK HIERARCHICAL CLUSTERING PROCEDURES [J].
HUBERT, L .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1974, 69 (347) :698-704