SWIFT: SCALABLE WEIGHTED ITERATIVE SAMPLING FOR FLOW CYTOMETRY CLUSTERING

被引：16

作者：

Naim, Iftekhar ^{[1
]}

Datta, Suprakash ^{[4
]}

Sharma, Gaurav ^{[1
,2
]}

Cavenaugh, James S. ^{[3
]}

Mosmann, Tim R. ^{[3
]}

机构：

[1] Univ Rochester, Dept Elect & Comp Engn, Rochester, NY 14627 USA

[2] Univ Rochester, Dept Biostatist & Computat Biol, Rochester, NY 14627 USA

[3] Univ Rochester, Ctr Vaccine Biol & Immunol, Rochester, NY 14627 USA

[4] York Univ, Dept Comp Sci & Engn, Toronto, ON M3J 2R7, Canada

来源：

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Flow cytometry; clustering; Gaussian mixture model; sampling; expectation-maximization; DATASETS;

D O I：

10.1109/ICASSP.2010.5495653

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Flow cytometry (FC) is a powerful technology for rapid multivariate analysis and functional discrimination of cells. Current FC platforms generate large, high-dimensional datasets which pose a significant challenge for traditional manual bivariate analysis. Automated multivariate clustering, though highly desirable, is also stymied by the critical requirement of identifying rare populations that form rather small clusters, in addition to the computational challenges posed by the large size and dimensionality of the datasets. In this paper, we address these twin challenges by developing a two-stage scalable multivariate parametric clustering algorithm. In the first stage, we model the data as a mixture of Gaussians and use an iterative weighted sampling technique to estimate the mixture components successively in order of decreasing size. In the second stage, we apply a graphbased hierarchical merging technique to combine Gaussian components with significant overlaps into the final number of desired clusters. The resulting algorithm offers a reduction in complexity over conventional mixture modeling while simultaneously allowing for better detection of small populations. We demonstrate the effectiveness of our method both on simulated data and actual flow cytometry datasets.

引用

页码：509 / 512

页数：4

共 12 条

[1]

Ashlock D., 2005, SMART ENG SYSTEM DES, V15, P453

[2]

Baudry J.P., 2008, Combining mixture components for clustering

[3]

Bradley PS., 1998, SCALING EXPECTATION

[4]

Brown M, 2000, CLIN CHEM, V46, P1221

[5]

Chan C., 2008, CYTOMETRY A

[6] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[7]

Figueiredo M. A. T., 1999, Energy Minimization Methods in Computer Vision and Pattern Recognition. Second International Workshop, EMMCVPR'99. Proceedings (Lecture Notes in Computer Science Vol.1654), P54

[8] Incremental model-based clustering for large datasets with small clusters [J].

Fraley, C ;

Raftery, A ;

Wehrens, R .

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2005, 14 (03) :529-546

[9] Automated Gating of flow cytometry data via robust model-based clustering [J].

Lo, Kenneth ;

Brinkman, Ryan Remy ;

Gottardo, Raphael .

CYTOMETRY PART A, 2008, 73A (04) :321-332

[10] Clustering massive datasets with applications in software metrics and tomography [J].

Maitra, R .

TECHNOMETRICS, 2001, 43 (03) :336-346

← 1 2 →