A modified area under the ROC curve and its application to marker selection and classification

被引:17
作者
Yu, WenBao [1 ]
Chang, Yuan-chin Ivan [2 ]
Park, Eunsik [1 ]
机构
[1] Chonnam Natl Univ, Dept Stat, Kwangju 500757, South Korea
[2] Acad Sinica, Inst Stat Sci, Taipei 11529, Taiwan
基金
新加坡国家研究基金会;
关键词
ROC curve; AUC; mAUC; pAUC; Marker selection; Classification; OPERATING CHARACTERISTIC CURVES; LASSO;
D O I
10.1016/j.jkss.2013.05.003
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The area under the ROC curve (AUC) can be interpreted as the probability that the classification scores of a diseased subject is larger than that of a non-diseased subject for a randomly sampled pair of subjects. From the perspective of classification, we want to find a way to separate two groups as distinctly as possible via AUC. When the difference of the scores of a marker is small, its impact on classification is less important. Thus, a new diagnostic/classification measure based on a modified area under the ROC curve (mAUC) is proposed, which is defined as a weighted sum of two AUCs, where the AUC with the smaller difference is assigned a lower weight, and vice versa. Using mAUC is robust in the sense that mAUC gets larger as AUC gets larger as long as they are not equal. Moreover, in many diagnostic situations, only a specific range of specificity is of interest. Under normal distributions, we show that if the AUCs of two markers are within similar ranges, the larger mAUC implies the larger partial AUC for a given specificity. This property of mAUC will help to identify the marker with the higher partial AUC, even when the AUCs are similar. Two nonparametric estimates of an mAUC and their variances are given. We also suggest the use of mAUC as the objective function for classification, and the use of the gradient Lasso algorithm for classifier construction and marker selection. Application to simulation datasets and real microarray gene expression datasets show that our method finds a linear classifier with a higher ROC curve than some other existing linear classifiers, especially in the range of low false positive rates. (C) 2013 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:161 / 175
页数:15
相关论文
共 21 条
  • [1] AREA ABOVE ORDINAL DOMINANCE GRAPH AND AREA BELOW RECEIVER OPERATING CHARACTERISTIC GRAPH
    BAMBER, D
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1975, 12 (04) : 387 - 415
  • [2] Exact bootstrap variances of the area under ROC curve
    Bandos, Andriy I.
    Rockette, Howard E.
    Gur, David
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2007, 36 (13-16) : 2443 - 2461
  • [3] COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH
    DELONG, ER
    DELONG, DM
    CLARKEPEARSON, DI
    [J]. BIOMETRICS, 1988, 44 (03) : 837 - 845
  • [4] Partial AUC estimation and regression
    Dodd, LE
    Pepe, MS
    [J]. BIOMETRICS, 2003, 59 (03) : 614 - 623
  • [5] An introduction to ROC analysis
    Fawcett, Tom
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (08) : 861 - 874
  • [6] Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
    Golub, TR
    Slonim, DK
    Tamayo, P
    Huard, C
    Gaasenbeek, M
    Mesirov, JP
    Coller, H
    Loh, ML
    Downing, JR
    Caligiuri, MA
    Bloomfield, CD
    Lander, ES
    [J]. SCIENCE, 1999, 286 (5439) : 531 - 537
  • [7] Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: An update
    Hanley, JA
    HajianTilaki, KO
    [J]. ACADEMIC RADIOLOGY, 1997, 4 (01) : 49 - 58
  • [8] A Gradient-Based Optimization Algorithm for LASSO
    Kim, Jinseog
    Kim, Yuwon
    Kim, Yongdai
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2008, 17 (04) : 994 - 1009
  • [9] Kim Y, 2004, P 21 INT C MACH LEAR, P60, DOI [DOI 10.1145/1015330.1015364, 10.1145/1015330.1015364]
  • [10] A boosting method for maximizing the partial area under the ROC curve
    Komori, Osamu
    Eguchi, Shinto
    [J]. BMC BIOINFORMATICS, 2010, 11