Automated Gating of flow cytometry data via robust model-based clustering

被引:177
作者
Lo, Kenneth [1 ]
Brinkman, Ryan Remy [2 ]
Gottardo, Raphael [1 ]
机构
[1] Univ British Columbia, Dept Stat, Vancouver, BC V6T 1Z2, Canada
[2] British Columbia Canc Res Ctr, Terry Fox Lab, Vancouver, BC V5Z 1L3, Canada
关键词
Box-Cox transformation; EM algorithm; mixture model; outliers; statistics; t distribution; flow cytometry; gating; clustering;
D O I
10.1002/cyto.a.20531
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The capability of flow cytometry to offer rapid quantification of multidimensional characteristics for millions of cells has made this technology indispensable for health research, medical diagnosis, and treatment. However, the lack of statistical and bioinformatics tools to parallel recent high-throughput technological advancements has hindered this technology from reaching its full potential. We propose a flexible statistical model-based clustering approach for identifying cell populations in flow cytometry data based on t-mixture models with a Box-Cox transformation. This approach generalizes the popular Gaussian mixture models to account for outliers and allow for nonelliptical clusters. We describe an Expectation-Maximization (EM) algorithm to simultaneously handle parameter estimation and transformation selection. Using two publicly available datasets, we demonstrate that our proposed methodology provides enough flexibility and robustness to mimic manual gating results performed by an expert researcher. In addition, we present results from a simulation study, which show that this new clustering framework gives better results in terms of robustness to model misspecification and estimation of the number of clusters, compared to the popular mixture models. The proposed clustering methodology is well adapted to automated analysis of flow cytometry data. It tends to give more reproducible results, and helps reduce the significant subjectivity and human time cost encountered in manual gating analysis. (C) 2008 International Society for Analytical Cytology.
引用
收藏
页码:321 / 332
页数:12
相关论文
共 66 条
  • [21] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [22] MODIFIED BOX-COX TRANSFORM FOR MODULATING THE DYNAMIC-RANGE OF FLOW-CYTOMETRY DATA
    DVORAK, JA
    BANKS, SM
    [J]. CYTOMETRY, 1989, 10 (06): : 811 - 813
  • [23] How many clusters? Which clustering method? Answers via model-based cluster analysis
    Fraley, C
    Raftery, AE
    [J]. COMPUTER JOURNAL, 1998, 41 (08) : 578 - 588
  • [24] Model-based clustering, discriminant analysis, and density estimation
    Fraley, C
    Raftery, AE
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) : 611 - 631
  • [25] Algorithms for model-based Gaussian hierarchical clustering
    Fraley, C
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 20 (01) : 270 - 281
  • [26] Identification of compounds that enhance the anti-lymphoma activity of rituximab using flow cytometric high-content screening
    Gasparetto, M
    Gentry, T
    Sebti, S
    O'Bryan, E
    Nimmanapalli, R
    Blaskovich, MA
    Bhalla, K
    Rizzieri, D
    Haaland, P
    Dunne, J
    Smith, C
    [J]. JOURNAL OF IMMUNOLOGICAL METHODS, 2004, 292 (1-2) : 59 - 71
  • [27] Bioconductor: open software development for computational biology and bioinformatics
    Gentleman, RC
    Carey, VJ
    Bates, DM
    Bolstad, B
    Dettling, M
    Dudoit, S
    Ellis, B
    Gautier, L
    Ge, YC
    Gentry, J
    Hornik, K
    Hothorn, T
    Huber, W
    Iacus, S
    Irizarry, R
    Leisch, F
    Li, C
    Maechler, M
    Rossini, AJ
    Sawitzki, G
    Smith, C
    Smyth, G
    Tierney, L
    Yang, JYH
    Zhang, JH
    [J]. GENOME BIOLOGY, 2004, 5 (10)
  • [28] Analysis of tomato root initiation using a normal mixture distribution
    Gutierrez, RG
    Carroll, RJ
    Wang, NY
    Lee, GH
    Taylor, BH
    [J]. BIOMETRICS, 1995, 51 (04) : 1461 - 1468
  • [29] Statistical methods and software for the analysis of highthroughput reverse genetic assays using flow cytometry readouts
    Hahne, Florian
    Arlt, Dorit
    Sauermann, Mamatha
    Majety, Meher
    Poustka, Annemarie
    Wiemann, Stefan
    Huber, Wolfgang
    [J]. GENOME BIOLOGY, 2006, 7 (08)
  • [30] Ihaka R., 1996, Journal of computational and graphical statistics, V5, P299, DOI [10.1080/10618600.1996.10474713, 10.2307/1390807]