Estimating classification probabilities in high-dimensional diagnostic studies

被引:6
作者
Appel, Inka J. [1 ]
Gronwald, Wolfram [1 ]
Spang, Rainer [1 ]
机构
[1] Univ Regensburg, Inst Funct Genom, D-93053 Regensburg, Germany
关键词
GENE; CANCER; DISEASE;
D O I
10.1093/bioinformatics/btr434
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Classification algorithms for high-dimensional biological data like gene expression profiles or metabolomic fingerprints are typically evaluated by the number of misclassifications across a test dataset. However, to judge the classification of a single case in the context of clinical diagnosis, we need to assess the uncertainties associated with that individual case rather than the average accuracy across many cases. Reliability of individual classifications can be expressed in terms of class probabilities. While classification algorithms are a well-developed area of research, the estimation of class probabilities is considerably less progressed in biology, with only a few classification algorithms that provide estimated class probabilities. Results: We compared several probability estimators in the context of classification of metabolomics profiles. Evaluation criteria included sparseness biases, calibration of the estimator, the variance of the estimator and its performance in identifying highly reliable classifications. We observed that several of them display artifacts that compromise their use in practice. Classification probabilities based on a combination of local cross-validation error rates and monotone regression prove superior in metabolomic profiling.
引用
收藏
页码:2563 / 2570
页数:8
相关论文
共 19 条
[1]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[2]   AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION [J].
AYER, M ;
BRUNK, HD ;
EWING, GM ;
REID, WT ;
SILVERMAN, E .
ANNALS OF MATHEMATICAL STATISTICS, 1955, 26 (04) :641-647
[3]  
Caruana R., 2005, P 22 INT C MACH LEAR, P625, DOI [DOI 10.1145/1102351.1102430, 10.1145/1102351.1102430]
[4]   THE WELL-CALIBRATED BAYESIAN [J].
DAWID, AP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1982, 77 (379) :605-610
[5]   Predicting gene regulation by sigma factors in Bacillus subtilis from genome-wide data [J].
de Hoon, M. J. L. ;
Makita, Y. ;
Imoto, S. ;
Kobayashi, K. ;
Ogasawara, N. ;
Nakai, K. ;
Miyano, S. .
BIOINFORMATICS, 2004, 20 :101-108
[6]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[7]   DNA Microarrays Are Predictive of Cancer Prognosis: A Re-evaluation [J].
Fan, Xiaohui ;
Shi, Leming ;
Fang, Hong ;
Cheng, Yiyu ;
Perkins, Roger ;
Tong, Weida .
CLINICAL CANCER RESEARCH, 2010, 16 (02) :629-636
[8]   Detection of autosomal dominant polycystic kidney disease by NMR spectroscopic fingerprinting of urine [J].
Gronwald, Wolfram ;
Klein, Matthias S. ;
Zeltner, Raoul ;
Schulze, Bernd-Detlef ;
Reinhold, Stephan W. ;
Deutschmann, Markus ;
Immervoll, Ann-Kathrin ;
Boeger, Carsten A. ;
Banas, Bernhard ;
Eckardt, Kai-Uwe ;
Oefner, Peter J. .
KIDNEY INTERNATIONAL, 2011, 79 (11) :1244-1253
[9]   Interpretation of microarray data in cancer [J].
Michiels, S. ;
Koscielny, S. ;
Hill, C. .
BRITISH JOURNAL OF CANCER, 2007, 96 (08) :1155-1158
[10]   Improved classification accuracy in 1-and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation [J].
Parsons, Helen M. ;
Ludwig, Christian ;
Guenther, Ulrich L. ;
Viant, Mark R. .
BMC BIOINFORMATICS, 2007, 8 (1)