Are the statistical tests the best way to deal with the biomarker selection problem?

被引:0
作者
Urkullu, Ari [1 ]
Perez, Aritz [2 ]
Calvo, Borja [1 ]
机构
[1] Univ Basque Country UPV EHU, Dept Comp Sci & Artificial Intelligence, Paseo Manuel de Lardizabal 1, Donostia San Sebastian 20018, Gipuzkoa, Spain
[2] Basque Ctr Appl Math BCAM, Dept Data Sci, Alameda Mazarredo 14, Bilbao 48009, Bizkaia, Spain
关键词
Biomarker selection; Statistical tests; Reproducibility; Differential methylation detection; DIFFERENTIALLY METHYLATED LOCI;
D O I
10.1007/s10115-022-01677-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Statistical tests are a powerful set of tools when applied correctly, but unfortunately the extended misuse of them has caused great concern. Among many other applications, they are used in the detection of biomarkers so as to use the resulting p-values as a reference with which the candidate biomarkers are ranked. Although statistical tests can be used to rank, they have not been designed for that use. Moreover, there is no need to compute any p-value to build a ranking of candidate biomarkers. Those two facts raise the question of whether or not alternative methods which are not based on the computation of statistical tests that match or improve their performances can be proposed. In this paper, we propose two alternative methods to statistical tests. In addition, we propose an evaluation framework to assess both statistical tests and alternative methods in terms of both the performance and the reproducibility. The results indicate that there are alternative methods that can match or surpass methods based on statistical tests in terms of the reproducibility when processing real data, while maintaining a similar performance when dealing with synthetic data. The main conclusion is that there is room for the proposal of such alternative methods.
引用
收藏
页码:1549 / 1570
页数:22
相关论文
共 30 条
  • [1] Alzubaidi AHA, 2019, EVOLUTIONARY DEEP MI
  • [2] The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research
    Amrhein, Valentin
    Korner-Nievergelt, Franzi
    Roth, Tobias
    [J]. PEERJ, 2017, 5
  • [3] Development of biomarker classifiers from high-dimensional data
    Baek, Songjoon
    Tsai, Chen-An
    Chen, James J.
    [J]. BRIEFINGS IN BIOINFORMATICS, 2009, 10 (05) : 537 - 546
  • [4] Baker M, 2016, NATURE, V533, P452, DOI 10.1038/533452a
  • [5] Genome-wide DNA methylation analysis for diabetic nephropathy in type 1 diabetes mellitus
    Bell, Christopher G.
    Teschendorff, Andrew E.
    Rakyan, Vardhman K.
    Maxwell, Alexander P.
    Beck, Stephan
    Savage, David A.
    [J]. BMC MEDICAL GENOMICS, 2010, 3
  • [6] Semiparametric Tests for Identifying Differentially Methylated Loci With Case-Control Designs Using Illumina Arrays
    Chen, Yong
    Ning, Yang
    Hong, Chuan
    Wang, Shuang
    [J]. GENETIC EPIDEMIOLOGY, 2014, 38 (01) : 42 - 50
  • [7] Cohen J., 1995, The Earth Is Round (Rejoinder), P5
  • [8] The reproducibility of research and the misinterpretation of p-values
    Colquhoun, David
    [J]. ROYAL SOCIETY OPEN SCIENCE, 2017, 4 (12):
  • [9] Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis
    Du, Pan
    Zhang, Xiao
    Huang, Chiang-Ching
    Jafari, Nadereh
    Kibbe, Warren A.
    Hou, Lifang
    Lin, Simon M.
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [10] An introduction to ROC analysis
    Fawcett, Tom
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (08) : 861 - 874