flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding

被引:99
作者
Ge, Yongchao [1 ]
Sealfon, Stuart C.
机构
[1] Mt Sinai Sch Med, Dept Neurol, New York, NY 10029 USA
关键词
CELL-POPULATION IDENTIFICATION; AUTOMATED IDENTIFICATION;
D O I
10.1093/bioinformatics/bts300
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: For flow cytometry data, there are two common approaches to the unsupervised clustering problem: one is based on the finite mixture model and the other on spatial exploration of the histograms. The former is computationally slow and has difficulty to identify clusters of irregular shapes. The latter approach cannot be applied directly to high-dimensional data as the computational time and memory become unmanageable and the estimated histogram is unreliable. An algorithm without these two problems would be very useful. Results: In this article, we combine ideas from the finite mixture model and histogram spatial exploration. This new algorithm, which we call flowPeaks, can be applied directly to high-dimensional data and identify irregular shape clusters. The algorithm first uses K-means algorithm with a large K to partition the cell population into many small clusters. These partitioned data allow the generation of a smoothed density function using the finite mixture model. All local peaks are exhaustively searched by exploring the density function and the cells are clustered by the associated local peak. The algorithm flowPeaks is automatic, fast and reliable and robust to cluster shape and outliers. This algorithm has been applied to flow cytometry data and it has been compared with state of the art algorithms, including Misty Mountain, FLOCK, flowMeans, flowMerge and FLAME.
引用
收藏
页码:2052 / 2058
页数:7
相关论文
共 24 条
[1]   Rapid Cell Population Identification in Flow Cytometry Data [J].
Aghaeepour, Nima ;
Nikolic, Radina ;
Hoos, Holger H. ;
Brinkman, Ryan R. .
CYTOMETRY PART A, 2011, 79A (01) :6-13
[2]  
[Anonymous], 2007, P 18 ANN ACM SIAM S
[3]   Statistical mixture modeling for cell subtype identification in flow cytometry [J].
Chan, Cliburn ;
Feng, Feng ;
Ottinger, Janet ;
Foster, David ;
West, Mike ;
Kepler, Thomas B. .
CYTOMETRY PART A, 2008, 73A (08) :693-701
[4]  
Finak Greg, 2009, Advances in Bioinformatics, V2009, P247646, DOI 10.1155/2009/247646
[5]   ON THE HISTOGRAM AS A DENSITY ESTIMATOR - L2 THEORY [J].
FREEDMAN, D ;
DIACONIS, P .
ZEITSCHRIFT FUR WAHRSCHEINLICHKEITSTHEORIE UND VERWANDTE GEBIETE, 1981, 57 (04) :453-476
[6]  
Fung BCM, 2003, SIAM PROC S, P59
[7]   Identification of compounds that enhance the anti-lymphoma activity of rituximab using flow cytometric high-content screening [J].
Gasparetto, M ;
Gentry, T ;
Sebti, S ;
O'Bryan, E ;
Nimmanapalli, R ;
Blaskovich, MA ;
Bhalla, K ;
Rizzieri, D ;
Haaland, P ;
Dunne, J ;
Smith, C .
JOURNAL OF IMMUNOLOGICAL METHODS, 2004, 292 (1-2) :59-71
[8]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[9]  
Hartigan J. A., 1979, Applied Statistics, V28, P100, DOI 10.2307/2346830
[10]   COMPARING PARTITIONS [J].
HUBERT, L ;
ARABIE, P .
JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) :193-218