Confidence bands and hypothesis tests for hit enrichment curves

被引:1
|
作者
Ash, Jeremy R. [1 ,2 ]
Hughes-Oliver, Jacqueline M. [1 ]
机构
[1] North Carolina State Univ, Dept Stat, Bioinformat Res Ctr, Raleigh, NC 27695 USA
[2] SAS Inst, JMP Div, Cary, NC 27513 USA
关键词
Virtual screening; Enrichment factor; Lift curve; Early enrichment; Ranking algorithm; Empirical process; OPTIMIZATION; RECOMMENDATIONS; DIFFERENCE; DISCOVERY; INTERVALS; DOCKING; MODELS; SETS; TOOL;
D O I
10.1186/s13321-022-00629-0
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In virtual screening for drug discovery, hit enrichment curves are widely used to assess the performance of ranking algorithms with regard to their ability to identify early enrichment. Unfortunately, researchers almost never consider the uncertainty associated with estimating such curves before declaring differences between performance of competing algorithms. Uncertainty is often large because the testing fractions of interest to researchers are small. Appropriate inference is complicated by two sources of correlation that are often overlooked: correlation across different testing fractions within a single algorithm, and correlation between competing algorithms. Additionally, researchers are often interested in making comparisons along the entire curve, not only at a few testing fractions. We develop inferential procedures to address both the needs of those interested in a few testing fractions, as well as those interested in the entire curve. For the former, four hypothesis testing and (pointwise) confidence intervals are investigated, and a newly developed EmProc approach is found to be most effective. For inference along entire curves, EmProc-based confidence bands are recommended for simultaneous coverage and minimal width. While we focus on the hit enrichment curve, this work is also appropriate for lift curves that are used throughout the machine learning community. Our inferential procedures trivially extend to enrichment factors, as well.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Confidence bands and hypothesis tests for hit enrichment curves
    Jeremy R Ash
    Jacqueline M Hughes-Oliver
    Journal of Cheminformatics, 14
  • [2] Confidence bands for isotonic median curves using sign tests
    Dümbgen, L
    Johns, RB
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2004, 13 (02) : 519 - 533
  • [3] CONFIDENCE BANDS FOR POLYNOMIAL CURVES
    HOEL, PG
    ANNALS OF MATHEMATICAL STATISTICS, 1954, 25 (03): : 534 - 542
  • [4] Confidence bands for ROC curves
    Horvath, Lajos
    Horvath, Zsuzsanna
    Zhou, Wang
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2008, 138 (06) : 1894 - 1904
  • [5] Confidence bands for growth and response curves
    Sun, JY
    Raz, J
    Faraway, JJ
    STATISTICA SINICA, 1999, 9 (03) : 679 - 698
  • [6] Regional confidence bands for ROC curves
    Jensen, K
    Müller, HH
    Schäfer, H
    STATISTICS IN MEDICINE, 2000, 19 (04) : 493 - 509
  • [7] Optimal confidence bands for shaperestricted curves
    Dümbgen, L
    BERNOULLI, 2003, 9 (03) : 423 - 449
  • [8] WASSERSTEIN F-TESTS AND CONFIDENCE BANDS FOR THE FRECHET REGRESSION OF DENSITY RESPONSE CURVES
    Petersen, Alexander
    Liu, Xi
    Divani, Afshin A.
    ANNALS OF STATISTICS, 2021, 49 (01): : 590 - 611
  • [9] Bootstrap confidence bands for regression curves and their derivatives
    Claeskens, G
    Van Keilegom, I
    ANNALS OF STATISTICS, 2003, 31 (06): : 1852 - 1884
  • [10] CONFIDENCE BANDS FOR RECEIVER OPERATING CHARACTERISTIC CURVES
    MA, GQ
    HALL, WJ
    MEDICAL DECISION MAKING, 1993, 13 (03) : 191 - 197