densityCut: an efficient and versatile topological approach for automatic clustering of biological data

被引:28
作者
Ding, Jiarui [1 ,2 ]
Shah, Sohrab [1 ,2 ]
Condon, Anne [1 ]
机构
[1] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1Z4, Canada
[2] BC Canc Res Ctr, Dept Mol Oncol, Vancouver, BC V5Z 1L3, Canada
关键词
CLONAL EVOLUTION; MEAN SHIFT; REVEALS; POPULATION; CELLS;
D O I
10.1093/bioinformatics/btw227
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Many biological data processing problems can be formalized as clustering problems to partition data points into sensible and biologically interpretable groups. Results: This article introduces densityCut, a novel density-based clustering algorithm, which is both time-and space-efficient and proceeds as follows: densityCut first roughly estimates the densities of data points from a K-nearest neighbour graph and then refines the densities via a random walk. A cluster consists of points falling into the basin of attraction of an estimated mode of the underlining density function. A post-processing step merges clusters and generates a hierarchical cluster tree. The number of clusters is selected from the most stable clustering in the hierarchical cluster tree. Experimental results on ten synthetic benchmark datasets and two microarray gene expression datasets demonstrate that densityCut performs better than state-of-the-art algorithms for clustering biological datasets. For applications, we focus on the recent cancer mutation clustering and single cell data analyses, namely to cluster variant allele frequencies of somatic mutations to reveal clonal architectures of individual tumours, to cluster single-cell gene expression data to uncover cell population compositions, and to cluster single-cell mass cytometry data to detect communities of cells of the same functional states or types. densityCut performs better than competing algorithms and is scalable to large datasets.
引用
收藏
页码:2567 / 2576
页数:10
相关论文
共 39 条
[1]  
[Anonymous], 2014, ADV NEURAL INF PROCE
[2]  
[Anonymous], 2010, Advances in neural information processing systems
[3]   Mixtures of common t-factor analyzers for clustering high-dimensional microarray data [J].
Baek, Jangsun ;
McLachlan, Geoffrey J. .
BIOINFORMATICS, 2011, 27 (09) :1269-1276
[4]   Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum [J].
Bendall, Sean C. ;
Simonds, Erin F. ;
Qiu, Peng ;
Amir, El-ad D. ;
Krutzik, Peter O. ;
Finck, Rachel ;
Bruggner, Robert V. ;
Melamed, Rachel ;
Trejo, Angelica ;
Ornatsky, Olga I. ;
Balderas, Robert S. ;
Plevritis, Sylvia K. ;
Sachs, Karen ;
Pe'er, Dana ;
Tanner, Scott D. ;
Nolan, Garry P. .
SCIENCE, 2011, 332 (6030) :687-696
[5]   MEAN SHIFT, MODE SEEKING, AND CLUSTERING [J].
CHENG, YZ .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (08) :790-799
[6]   Mean shift: A robust approach toward feature space analysis [J].
Comaniciu, D ;
Meer, P .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) :603-619
[7]  
de Souto M.C., 2008, BMC BIOINFORMATICS, V9, P14
[8]   Clonal Architectures and Driver Mutations in Metastatic Melanomas [J].
Ding, Li ;
Kim, Minjung ;
Kanchi, Krishna L. ;
Dees, Nathan D. ;
Lu, Charles ;
Griffith, Malachi ;
Fenstermacher, David ;
Sung, Hyeran ;
Miller, Christopher A. ;
Goetz, Brian ;
Wendl, Michael C. ;
Griffith, Obi ;
Cornelius, Lynn A. ;
Linette, Gerald P. ;
McMichael, Joshua F. ;
Sondak, Vernon K. ;
Fields, Ryan C. ;
Ley, Timothy J. ;
Mule, James J. ;
Wilson, Richard K. ;
Weber, Jeffrey S. .
PLOS ONE, 2014, 9 (11)
[9]   Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing [J].
Ding, Li ;
Ley, Timothy J. ;
Larson, David E. ;
Miller, Christopher A. ;
Koboldt, Daniel C. ;
Welch, John S. ;
Ritchey, Julie K. ;
Young, Margaret A. ;
Lamprecht, Tamara ;
McLellan, Michael D. ;
McMichael, Joshua F. ;
Wallis, John W. ;
Lu, Charles ;
Shen, Dong ;
Harris, Christopher C. ;
Dooling, David J. ;
Fulton, Robert S. ;
Fulton, Lucinda L. ;
Chen, Ken ;
Schmidt, Heather ;
Kalicki-Veizer, Joelle ;
Magrini, Vincent J. ;
Cook, Lisa ;
McGrath, Sean D. ;
Vickery, Tammi L. ;
Wendl, Michael C. ;
Heath, Sharon ;
Watson, Mark A. ;
Link, Daniel C. ;
Tomasson, Michael H. ;
Shannon, William D. ;
Payton, Jacqueline E. ;
Kulkarni, Shashikant ;
Westervelt, Peter ;
Walter, Matthew J. ;
Graubert, Timothy A. ;
Mardis, Elaine R. ;
Wilson, Richard K. ;
DiPersio, John F. .
NATURE, 2012, 481 (7382) :506-510
[10]   Clonal evolution revealed by whole genome sequencing in a case of primary myelofibrosis transformed to secondary acute myeloid leukemia [J].
Engle, E. K. ;
Fisher, D. A. C. ;
Miller, C. A. ;
McLellan, M. D. ;
Fulton, R. S. ;
Moore, D. M. ;
Wilson, R. K. ;
Ley, T. J. ;
Oh, S. T. .
LEUKEMIA, 2015, 29 (04) :869-876