A Novel Online and Non-Parametric Approach for Drift Detection in Big Data

被引:25
作者
Bhaduri, Moinak [1 ]
Zhan, Justin [1 ]
Chiu, Carter [1 ]
Zhan, Felix [1 ]
机构
[1] Univ Nevada, Big Data Hub, Las Vegas, NV 89154 USA
基金
美国国家科学基金会;
关键词
Change point detection; non-parametric methods; Hoeffding's inequality; Bernstein's inequality; big data; anomaly detection;
D O I
10.1109/ACCESS.2017.2735378
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A sizable amount of current literature on online drift detection tools thrive on unrealistic parametric strictures such as normality or on non-parametric methods whose power performance is questionable. Using minimal realistic assumptions such as unimodality, we have strived to proffer an alternative, through a novel application of Bernstein's inequality. Simulations from such parametric densities as Beta and Logit-normal as well as real-data analyses demonstrate this new method's superiority over similar techniques relying on bounds, such as Hoeffding's. Improvements are apparent in terms of higher power, efficient sample sizes, and sensitivity to parameter values.
引用
收藏
页码:15883 / 15892
页数:10
相关论文
共 41 条
[1]  
Androutsopoulos I., 2000, SIGIR Forum, V34, P160
[2]  
Baena-Garcia M., 2006, 4 INT WORKSHOP KNOWL, V6, P77
[3]  
Bartlett P. L., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P243, DOI 10.1145/130385.130412
[4]  
Bercu B., 2015, SPRINGERBRIEFS MATH, DOI DOI 10.1007/978-3-319-22099-4
[5]  
Bifet A, 2010, J MACH LEARN RES, V11, P1601
[6]  
Bifet A, 2007, PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, P443
[7]  
Box G. E. P., 1970, Time series analysis, forecasting and control
[8]   Design and performance analysis of the exponentially weighted moving average mean estimate for processes subject to random step changes [J].
Chen, A ;
Elsayed, EA .
TECHNOMETRICS, 2002, 44 (04) :379-389
[9]   Maintaining time-decaying stream aggregates [J].
Cohen, E ;
Strauss, MJ .
JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2006, 59 (01) :19-36
[10]   Maintaining stream statistics over sliding windows [J].
Datar, M ;
Gionis, A ;
Indyk, P ;
Motwani, R .
SIAM JOURNAL ON COMPUTING, 2002, 31 (06) :1794-1813