Online evaluation of the Kolmogorov-Smirnov test on arbitrarily large samples

被引:2
|
作者
Cardoso, Douglas O. [1 ]
Galeno, Thalis D. [2 ]
机构
[1] Polytech Inst Tomar, Smart Cities Res Ctr, Tomar, Portugal
[2] Celso Suckow Fonseca Fed Ctr Technol Educ, Dept Comp Engn, Petropolis, RJ, Brazil
基金
芬兰科学院;
关键词
Data streams; Online learning; Concept drift; Change detection;
D O I
10.1016/j.jocs.2023.101959
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents an approximative online algorithm to perform the Kolmogorov-Smirnov test. There is a ubiquitous need for evaluating the fitness between statistical distributions and data samples, which this test conveniently meets. Taking some inspiration from the challenges of detecting concept drifts in data streams, our methodology shows how this goodness-of-fit statistical test can be used to detect such events, taking advantage of the fact that it is non-parametric and could be adapted to handle streams while keeping its original relatively small algorithmic complexity. The presented work focused on the one-sample test, which evaluates the hypothesis that a given univariate sample follows some reference distribution, for assessing an input stream with high precision in a time-and space-efficient fashion. The performance of our algorithm and some of the state-of-the-art methods were compared using synthetic and real data. We evaluated the accuracy, effectiveness and efficiency of these methods by making extensive experiments in multiple scenarios: varying reference distribution and its parameters, sample size, available memory, drift point and query interval. The results showed that our algorithm is advantageous in most cases, even with substantial restrictions of computational resources.
引用
收藏
页数:11
相关论文
共 50 条