Are the statistical tests the best way to deal with the biomarker selection problem?

被引：0

作者：

Urkullu, Ari ^{[1
]}

Perez, Aritz ^{[2
]}

Calvo, Borja ^{[1
]}

机构：

[1] Univ Basque Country UPV EHU, Dept Comp Sci & Artificial Intelligence, Paseo Manuel de Lardizabal 1, Donostia San Sebastian 20018, Gipuzkoa, Spain

[2] Basque Ctr Appl Math BCAM, Dept Data Sci, Alameda Mazarredo 14, Bilbao 48009, Bizkaia, Spain

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2022年 / 64卷 / 06期

关键词：

Biomarker selection; Statistical tests; Reproducibility; Differential methylation detection; DIFFERENTIALLY METHYLATED LOCI;

D O I：

10.1007/s10115-022-01677-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Statistical tests are a powerful set of tools when applied correctly, but unfortunately the extended misuse of them has caused great concern. Among many other applications, they are used in the detection of biomarkers so as to use the resulting p-values as a reference with which the candidate biomarkers are ranked. Although statistical tests can be used to rank, they have not been designed for that use. Moreover, there is no need to compute any p-value to build a ranking of candidate biomarkers. Those two facts raise the question of whether or not alternative methods which are not based on the computation of statistical tests that match or improve their performances can be proposed. In this paper, we propose two alternative methods to statistical tests. In addition, we propose an evaluation framework to assess both statistical tests and alternative methods in terms of both the performance and the reproducibility. The results indicate that there are alternative methods that can match or surpass methods based on statistical tests in terms of the reproducibility when processing real data, while maintaining a similar performance when dealing with synthetic data. The main conclusion is that there is room for the proposal of such alternative methods.

引用

页码：1549 / 1570

页数：22

共 30 条

[1] Alzubaidi AHA, 2019, EVOLUTIONARY DEEP MI
[2] The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research
Amrhein, Valentin
Korner-Nievergelt, Franzi
Roth, Tobias
[J]. PEERJ, 2017, 5
[3] Development of biomarker classifiers from high-dimensional data
Baek, Songjoon
Tsai, Chen-An
Chen, James J.
[J]. BRIEFINGS IN BIOINFORMATICS, 2009, 10 (05) : 537 - 546
[4] Baker M, 2016, NATURE, V533, P452, DOI 10.1038/533452a
[5] Genome-wide DNA methylation analysis for diabetic nephropathy in type 1 diabetes mellitus
Bell, Christopher G.
Teschendorff, Andrew E.
Rakyan, Vardhman K.
Maxwell, Alexander P.
Beck, Stephan
Savage, David A.
[J]. BMC MEDICAL GENOMICS, 2010, 3
[6] Semiparametric Tests for Identifying Differentially Methylated Loci With Case-Control Designs Using Illumina Arrays
Chen, Yong
Ning, Yang
Hong, Chuan
Wang, Shuang
[J]. GENETIC EPIDEMIOLOGY, 2014, 38 (01) : 42 - 50
[7] Cohen J., 1995, The Earth Is Round (Rejoinder), P5
[8] The reproducibility of research and the misinterpretation of p-values
Colquhoun, David
[J]. ROYAL SOCIETY OPEN SCIENCE, 2017, 4 (12):
[9] Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis
Du, Pan
Zhang, Xiao
Huang, Chiang-Ching
Jafari, Nadereh
Kibbe, Warren A.
Hou, Lifang
Lin, Simon M.
[J]. BMC BIOINFORMATICS, 2010, 11
[10] An introduction to ROC analysis
Fawcett, Tom
[J]. PATTERN RECOGNITION LETTERS, 2006, 27 (08) : 861 - 874

← 1 2 3 →