Estimation of predictive performance in high-dimensional data settings using learning curves

被引:0
作者
Goedhart, Jeroen M. [1 ]
Klausch, Thomas [1 ]
van de Wiel, Mark A. [1 ]
机构
[1] Amsterdam Univ Med Ctr, Amsterdam Publ Hlth Res Inst, Dept Epidemiol & Data Sci, De Boelelaan 1117, NL-1081 HV Amsterdam, Netherlands
关键词
High-dimensional data; Omics; Predictive performance; Area under the receiver operating curve; Bootstrap; Cross-validation; CROSS-VALIDATION; ERROR RATE; AREA; CLASSIFICATION; SIGNATURES; CANCER; SIZE;
D O I
10.1016/j.csda.2022.107622
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In high-dimensional prediction settings, it remains challenging to reliably estimate the test performance. To address this challenge, a novel performance estimation framework is presented. This framework, called Learn2Evaluate, is based on learning curves by fitting a smooth monotone curve depicting test performance as a function of the sample size. Learn2Evaluate has several advantages compared to commonly applied performance estimation methodologies. Firstly, a learning curve offers a graphical overview of a learner. This overview assists in assessing the potential benefit of adding training samples and it provides a more complete comparison between learners than performance estimates at a fixed subsample size. Secondly, a learning curve facilitates in estimating the performance at the total sample size rather than a subsample size. Thirdly, Learn2Evaluate allows the computation of a theoretically justified and useful lower confidence bound. Furthermore, this bound may be tightened by performing a bias correction. The benefits of Learn2Evaluate are illustrated by a simulation study and applications to omics data.(c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:13
相关论文
共 42 条
  • [1] AREA ABOVE ORDINAL DOMINANCE GRAPH AND AREA BELOW RECEIVER OPERATING CHARACTERISTIC GRAPH
    BAMBER, D
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1975, 12 (04) : 387 - 415
  • [2] Bengio Y, 2004, J MACH LEARN RES, V5, P1089
  • [3] RNA-Seq of Tumor-Educated Platelets Enables Blood-Based Pan-Cancer, Multiclass, and Molecular Pathway Cancer Diagnostics
    Best, Myron G.
    Sol, Nik
    Kooi, Irsan
    Tannous, Jihane
    Westerman, Bart A.
    Rustenburg, Francois
    Schellen, Pepijn
    Verschueren, Heleen
    Post, Edward
    Koster, Jan
    Ylstra, Bauke
    Ameziane, Najim
    Dorsman, Josephine
    Smit, Egbert F.
    Verheul, Henk M.
    Noske, David P.
    Reijneveld, Jaap C.
    Nilsson, R. Jonas A.
    Tannous, Bakhos A.
    Wesseling, Pieter
    Wurdinger, Thomas
    [J]. CANCER CELL, 2015, 28 (05) : 666 - 676
  • [4] An Expanded View of Complex Traits: From Polygenic to Omnigenic
    Boyle, Evan A.
    Li, Yang I.
    Pritchard, Jonathan K.
    [J]. CELL, 2017, 169 (07) : 1177 - 1186
  • [5] Brier GW., 1950, MON WEATHER REV, V78, P1, DOI [DOI 10.1175/1520-0493(1950)0782.0.CO
  • [6] 2, 2.0.CO
  • [7] 2, DOI 10.1175/1520-0493(1950)078ANDLT
  • [8] 0001:VOFEITANDGT
  • [9] 2.0.CO
  • [10] 2]