On evaluating stream learning algorithms

被引：360

作者：

Gama, Joao ^{[1
,2
]}

Sebastiao, Raquel ^{[1
,3
]}

Rodrigues, Pedro Pereira ^{[1
,4
]}

机构：

[1] Univ Porto, LIAAD INESC TEC, P-4050190 Oporto, Portugal

[2] Univ Porto, Fac Econ, P-4050190 Oporto, Portugal

[3] Univ Porto, Fac Sci, P-4050190 Oporto, Portugal

[4] Univ Porto, Fac Med, P-4050190 Oporto, Portugal

来源：

MACHINE LEARNING | 2013年 / 90卷 / 03期

关键词：

Data streams; Evaluation design; Prequential analysis; Concept drift; DECISION TREE; CLASSIFIERS; TRACKING; DRIFT;

D O I：

10.1007/s10994-012-5320-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most streaming decision models evolve continuously over time, run in resource-aware environments, and detect and react to changes in the environment generating data. One important issue, not yet convincingly addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of prequential error with forgetting mechanisms to provide reliable error estimators. We prove that, in stationary data and for consistent learning algorithms, the holdout estimator, the prequential error and the prequential error estimated over a sliding window or using fading factors, all converge to the Bayes error. The use of prequential error with forgetting mechanisms reveals to be advantageous in assessing performance and in comparing stream learning algorithms. It is also worthwhile to use the proposed methods for hypothesis testing and for change detection. In a set of experiments in drift scenarios, we evaluate the ability of a standard change detection algorithm to detect change using three prequential error estimators. These experiments point out that the use of forgetting mechanisms (sliding windows or fading factors) are required for fast and efficient change detection. In comparison to sliding windows, fading factors are faster and memoryless, both important requirements for streaming applications. Overall, this paper is a contribution to a discussion on best practice for performance assessment when learning is a continuous process, and the decision models are dynamic and evolve over time.

引用

页码：317 / 346

页数：30

共 46 条

[1]

[Anonymous], P WORKSH RES ISS DAT

[2]

[Anonymous], P 5 AS CONTR C

[3]

[Anonymous], 2014, Evaluating Learning Algorithms A Classification Perspective, DOI DOI 10.1017/CBO9780511921803

[4]

[Anonymous], 2007, Uci machine learning repository

[5]

[Anonymous], 1997, MACHINE LEARNING, MCGRAW-HILL SCIENCE/ENGINEERING/MATH

[6]

Babcock B., 2003, P 22 ACM SIGMOD SIGA, P234, DOI DOI 10.1145/773153.773176

[7] Paired Learners for Concept Drift [J].

Bach, Stephen H. ;

Maloof, Marcus A. .

ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, :23-32

[8]

Basseville M, 1993, DETECTION ABRUPT CHA

[9]

Bifet A, 2010, J MACH LEARN RES, V11, P1601

[10] Fast Perceptron Decision Tree Learning from Evolving Data Streams [J].

Bifet, Albert ;

Holmes, Geoff ;

Pfahringer, Bernhard ;

Frank, Eibe .

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 :299-310

← 1 2 3 4 5 →