Concept drift detection based on Fisher's Exact test

被引:67
作者
de Lima Cabral, Danilo Rafael [1 ]
Maior de Barros, Roberto Souto [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, BR-50740560 Recife, PE, Brazil
关键词
Concept drift; Data streams; Drift detection; Online learning; Statistical tests; CLASSIFIERS;
D O I
10.1016/j.ins.2018.02.054
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Concept drift detectors are software that usually attempt to estimate the positions of concept drifts in large data streams in order to replace the base learner after changes in the data distribution and thus improve accuracy. Statistical Test of Equal Proportions (STEPD) is a simple, efficient, and well-known method which detects concept drifts based on a hypothesis test between two proportions. However, statistically, this test is not recommended when sample sizes are small or data are sparse and/or imbalanced. This article proposes an ingeniously efficient implementation of the statistically preferred but computationally expensive Fisher's Exact test and examines three slightly different applications of this test for concept drift detection, proposing FPDD, FSDD, and FTDD. Experiments run using four artificial dataset generators, with both abrupt and gradual drift versions, as well as three real-world datasets, suggest that the new methods improve the accuracy results and the detections of STEPD and other well-known and/or recent concept drift detectors in many scenarios, with little impact on memory and run-time usage. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:220 / 234
页数:15
相关论文
共 49 条
  • [1] [Anonymous], 2014, ELEMENTARY STAT STEP
  • [2] [Anonymous], THESIS
  • [3] [Anonymous], 2014, Handbook of Biological Statistics Internet
  • [4] [Anonymous], PATTERN RECOGN LETT
  • [5] [Anonymous], THESIS
  • [6] [Anonymous], STAT METHODS MED RES
  • [7] Paired Learners for Concept Drift
    Bach, Stephen H.
    Maloof, Marcus A.
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 23 - 32
  • [8] Baena-Garcia M., 2006, P 4 ECML PKDD INT WO, P77, DOI DOI 10.1007/978-3-642-23857-4_12
  • [9] RDDM: Reactive drift detection method
    Barros, Roberto S. M.
    Cabral, Danilo R. L.
    Goncalves, Paulo M., Jr.
    Santos, Silas G. T. C.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 90 : 344 - 355
  • [10] Bifet Albert, 2013, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2013. Proceedings: LNCS 8188, P465, DOI 10.1007/978-3-642-40988-2_30