Leukemia Prediction Using Sparse Logistic Regression

被引:18
作者
Manninen, Tapio [1 ]
Huttunen, Heikki [1 ]
Ruusuvuori, Pekka [1 ]
Nykter, Matti [2 ]
机构
[1] Tampere Univ Technol, Dept Signal Proc, FIN-33101 Tampere, Finland
[2] Univ Tampere, Inst Biomed Technol, FIN-33101 Tampere, Finland
来源
PLOS ONE | 2013年 / 8卷 / 08期
基金
芬兰科学院;
关键词
FLOW-CYTOMETRY; CLASSIFICATION;
D O I
10.1371/journal.pone.0072932
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We describe a supervised prediction method for diagnosis of acute myeloid leukemia (AML) from patient samples based on flow cytometry measurements. We use a data driven approach with machine learning methods to train a computational model that takes in flow cytometry measurements from a single patient and gives a confidence score of the patient being AML-positive. Our solution is based on an l(1) regularized logistic regression model that aggregates AML test statistics calculated from individual test tubes with different cell populations and fluorescent markers. The model construction is entirely data driven and no prior biological knowledge is used. The described solution scored a 100% classification accuracy in the DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukaemia Challenge against a golden standard consisting of 20 AML-positive and 160 healthy patients. Here we perform a more extensive validation of the prediction model performance and further improve and simplify our original method showing that statistically equal results can be obtained by using simple average marker intensities as features in the logistic regression model. In addition to the logistic regression based model, we also present other classification models and compare their performance quantitatively. The key benefit in our prediction method compared to other solutions with similar performance is that our model only uses a small fraction of the flow cytometry measurements making our solution highly economical.
引用
收藏
页数:10
相关论文
共 19 条
  • [1] Aghaeepour N, 2013, NAT METHODS, V10, P228, DOI [10.1038/nmeth.2365, 10.1038/NMETH.2365]
  • [2] DISTRIBUTION OF 2-SAMPLE CRAMER-VON MISES CRITERION
    ANDERSON, TW
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1962, 33 (03): : 1148 - &
  • [3] Analysis of Flow Cytometry Data by Matrix Relevance Learning Vector Quantization
    Biehl, Michael
    Bunte, Kerstin
    Schneider, Petra
    [J]. PLOS ONE, 2013, 8 (03):
  • [4] Regularization Paths for Generalized Linear Models via Coordinate Descent
    Friedman, Jerome
    Hastie, Trevor
    Tibshirani, Rob
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01): : 1 - 22
  • [5] Classifier technology and the illusion of progress
    Hand, David J.
    [J]. STATISTICAL SCIENCE, 2006, 21 (01) : 1 - 14
  • [6] Hastie T., 2009, ELEMENTS STAT LEARNI, DOI 10.1007/978-0-387-84858-7
  • [7] Basic theory and clinical applications of flow cytometry
    Henel, Gabriella
    Schmitz, John L.
    [J]. LABMEDICINE, 2007, 38 (07): : 428 - 436
  • [8] VERY SIMPLE CLASSIFICATION RULES PERFORM WELL ON MOST COMMONLY USED DATASETS
    HOLTE, RC
    [J]. MACHINE LEARNING, 1993, 11 (01) : 63 - 91
  • [9] Huttunen H, 2012, MACH VISION APPL, P1
  • [10] A regression model approach to enable cell morphology correction in high-throughput flow cytometry
    Knijnenburg, Theo A.
    Roda, Oriol
    Wan, Yakun
    Nolan, Garry P.
    Aitchison, John D.
    Shmulevich, Ilya
    [J]. MOLECULAR SYSTEMS BIOLOGY, 2011, 7