Predicting Cell Populations in Single Cell Mass Cytometry Data

被引:41
作者
Abdelaal, Tamim [1 ,2 ]
van Unen, Vincent [3 ]
Hollt, Thomas [2 ,4 ]
Koning, Frits [3 ]
Reinders, Marcel J. T. [1 ,2 ]
Mahfouz, Ahmed [1 ,2 ]
机构
[1] Delft Univ Technol, Delft Bioinformat Lab, NL-2628 XE Delft, Netherlands
[2] Leiden Univ, Med Ctr, Leiden Computat Biol Ctr, Einthovenweg 20, NL-2333 ZC Leiden, Netherlands
[3] Leiden Univ, Med Ctr, Dept Immunohematol & Blood Transfus, NL-2333 ZA Leiden, Netherlands
[4] Delft Univ Technol, Comp Graph & Visualizat, NL-2628 XE Delft, Netherlands
基金
欧盟地平线“2020”;
关键词
single cell; mass cytometry; cell population prediction; machine learning; IMMUNE; FLOW; VISUALIZATION; SPACE;
D O I
10.1002/cyto.a.23738
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Mass cytometry by time-of-flight (CyTOF) is a valuable technology for high-dimensional analysis at the single cell level. Identification of different cell populations is an important task during the data analysis. Many clustering tools can perform this task, which is essential to identify "new" cell populations in explorative experiments. However, relying on clustering is laborious since it often involves manual annotation, which significantly limits the reproducibility of identifying cell-populations across different samples. The latter is particularly important in studies comparing different conditions, for example in cohort studies. Learning cell populations from an annotated set of cells solves these problems. However, currently available methods for automatic cell population identification are either complex, dependent on prior biological knowledge about the populations during the learning process, or can only identify canonical cell populations. We propose to use a linear discriminant analysis (LDA) classifier to automatically identify cell populations in CyTOF data. LDA outperforms two state-of-the-art algorithms on four benchmark datasets. Compared to more complex classifiers, LDA has substantial advantages with respect to the interpretable performance, reproducibility, and scalability to larger datasets with deeper annotations. We apply LDA to a dataset of similar to 3.5 million cells representing 57 cell populations in the Human Mucosal Immune System. LDA has high performance on abundant cell populations as well as the majority of rare cell populations, and provides accurate estimates of cell population frequencies. Further incorporating a rejection option, based on the estimated posterior probabilities, allows LDA to identify previously unknown (new) cell populations that were not encountered during training. Altogether, reproducible prediction of cell population compositions using LDA opens up possibilities to analyze large cohort studies based on CyTOF data. (C) 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc.
引用
收藏
页码:769 / 781
页数:13
相关论文
共 31 条
  • [1] viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia
    Amir, El-ad David
    Davis, Kara L.
    Tadmor, Michelle D.
    Simonds, Erin F.
    Levine, Jacob H.
    Bendall, Sean C.
    Shenfeld, Daniel K.
    Krishnaswamy, Smita
    Nolan, Garry P.
    Pe'er, Dana
    [J]. NATURE BIOTECHNOLOGY, 2013, 31 (06) : 545 - +
  • [2] Mass Cytometry: Technique for Real Time Single Cell Multitarget Immunoassay Based on Inductively Coupled Plasma Time-of-Flight Mass Spectrometry
    Bandura, Dmitry R.
    Baranov, Vladimir I.
    Ornatsky, Olga I.
    Antonov, Alexei
    Kinach, Robert
    Lou, Xudong
    Pavlov, Serguei
    Vorobiev, Sergey
    Dick, John E.
    Tanner, Scott D.
    [J]. ANALYTICAL CHEMISTRY, 2009, 81 (16) : 6813 - 6822
  • [3] Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum
    Bendall, Sean C.
    Simonds, Erin F.
    Qiu, Peng
    Amir, El-ad D.
    Krutzik, Peter O.
    Finck, Rachel
    Bruggner, Robert V.
    Melamed, Rachel
    Trejo, Angelica
    Ornatsky, Olga I.
    Balderas, Robert S.
    Plevritis, Sylvia K.
    Sachs, Karen
    Pe'er, Dana
    Tanner, Scott D.
    Nolan, Garry P.
    [J]. SCIENCE, 2011, 332 (6030) : 687 - 696
  • [4] Integrating single-cell transcriptomic data across different conditions, technologies, and species
    Butler, Andrew
    Hoffman, Paul
    Smibert, Peter
    Papalexi, Efthymia
    Satija, Rahul
    [J]. NATURE BIOTECHNOLOGY, 2018, 36 (05) : 411 - +
  • [5] An Immune Atlas of Clear Cell Renal Cell Carcinoma
    Chevrier, Stephane
    Levine, Jacob Harrison
    Zanotelli, Vito Riccardo Tomaso
    Silina, Karina
    Schulz, Daniel
    Bacac, Marina
    Ries, Carola Hermine
    Ailles, Laurie
    Jewett, Michael Alexander Spencer
    Moch, Holger
    van den Broek, Maries
    Beisel, Christian
    Stadler, Michael Beda
    Gedye, Craig
    Reis, Bernhard
    Pe'er, Dana
    Bodenmiller, Bernd
    [J]. CELL, 2017, 169 (04) : 736 - 749
  • [6] Mean shift: A robust approach toward feature space analysis
    Comaniciu, D
    Meer, P
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) : 603 - 619
  • [7] Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors
    Haghverdi, Laleh
    Lun, Aaron T. L.
    Morgan, Michael D.
    Marioni, John C.
    [J]. NATURE BIOTECHNOLOGY, 2018, 36 (05) : 421 - +
  • [8] Cytosplore: Interactive Immune Cell Phenotyping for Large Single-Cell Datasets
    Hoellt, T.
    Pezzotti, N.
    van Unen, V.
    Koning, F.
    Eisemann, E.
    Lelieveldt, B.
    Vilanova, A.
    [J]. COMPUTER GRAPHICS FORUM, 2016, 35 (03) : 171 - 180
  • [9] Analysis of a complex of statistical variables into principal components
    Hotelling, H
    [J]. JOURNAL OF EDUCATIONAL PSYCHOLOGY, 1933, 24 : 417 - 441
  • [10] Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure
    Hsiao, Chiaowen
    Liu, Mengya
    Stanton, Rick
    McGee, Monnie
    Qian, Yu
    Scheuermann, Richard H.
    [J]. CYTOMETRY PART A, 2016, 89A (01) : 71 - 88