Testing for normality with neural networks

被引:4
作者
Simic, Milos [1 ]
机构
[1] Univ Belgrade, Studentski Trg 1, Belgrade 11000, Serbia
关键词
Neural Networks; Binary Classification; Normal Distribution; Goodness-of-Fit; MATHEMATICAL CONTRIBUTIONS; VARIANCE TEST; BOOTSTRAP; STATISTICS; SUPPLEMENT; EVOLUTION; MEMOIR;
D O I
10.1007/s00521-021-06229-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we treat the problem of testing for normality as a binary classification problem and construct a feedforward neural network that can act as a powerful normality test. We show that by changing its decision threshold, we can control the frequency of false non-normal predictions and thus make the network more similar to standard statistical tests. We also find the optimal decision thresholds that minimize the total error probability for each sample size. The experiments conducted on the samples with no more than 100 elements suggest that our method is more accurate and more powerful than the selected standard tests of normality for almost all the types of alternative distributions and sample sizes. In particular, the neural network was the most powerful method for testing normality of the samples with fewer than 30 elements regardless of the alternative distribution type. Its total accuracy increased with the sample size. Additionally, when the optimal decision-thresholds were used, the network was very accurate for larger samples with 250-1000 elements. With AUROC equal to almost 1, the network was the most accurate method overall. Since the normality of data is an assumption of numerous statistical techniques, the network constructed in this study has a very high potential for use in everyday practice of statistics, data analysis and machine learning.
引用
收藏
页码:16279 / 16313
页数:35
相关论文
共 109 条
[1]  
Ahmad F, 2015, PAK J STAT OPER RES, V11, P331
[2]  
Al-Rawi MS, 2012, LECT NOTES COMPUT SC, V7324, P34, DOI 10.1007/978-3-642-31295-3_5
[3]   A TEST OF GOODNESS OF FIT [J].
ANDERSON, TW ;
DARLING, DA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1954, 49 (268) :765-769
[4]   ASYMPTOTIC THEORY OF CERTAIN GOODNESS OF FIT CRITERIA BASED ON STOCHASTIC PROCESSES [J].
ANDERSON, TW ;
DARLING, DA .
ANNALS OF MATHEMATICAL STATISTICS, 1952, 23 (02) :193-212
[5]   ON THE BOOTSTRAP OF U-STATISTICS AND V-STATISTICS [J].
ARCONES, MA ;
GINE, E .
ANNALS OF STATISTICS, 1992, 20 (02) :655-674
[6]  
Arthur Gretton, 2007, AAAI, P1637
[7]  
Bernhard Scholkopf, 2001, LEARNING KERNELS SUP, V12
[8]  
Blanchard G, 2010, J MACH LEARN RES, V11, P2973
[9]   Integrating structured biological data by Kernel Maximum Mean Discrepancy [J].
Borgwardt, Karsten M. ;
Gretton, Arthur ;
Rasch, Malte J. ;
Kriegel, Hans-Peter ;
Schoelkopf, Bernhard ;
Smola, Alex J. .
BIOINFORMATICS, 2006, 22 (14) :E49-E57
[10]  
BOWMAN KO, 1975, BIOMETRIKA, V62, P243, DOI 10.2307/2335355