Adaptive Supervised Learning Model for Training set Selection under Concept Drift Data Streams

被引:2
作者
Patil, Pramod D. [1 ]
Kulkarni, Parag [2 ]
机构
[1] Padmashree Dr DY Patil Inst Engn & Technol, Dept Comp Engn, Pune, Maharashtra, India
[2] Coll Engn, Dept Comp Engn, Pune, Maharashtra, India
来源
2013 INTERNATIONAL CONFERENCE ON CLOUD & UBIQUITOUS COMPUTING & EMERGING TECHNOLOGIES (CUBE 2013) | 2013年
关键词
Concept Drift; Data streams; Adaptive Training set; Supervised learning;
D O I
10.1109/CUBE.2013.17
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dynamic changes are a part of everyday life. When there is a change in data, the classification models need to be adaptive to the changes. In this paper we propose adaptive supervised learning model for training set selection under concept drift data streams. This paper focuses on adaptive supervised learning techniques, where adaptivity to changes in data over time is achieved by selective training set methodology. These selective training set methods typically can be used plugging in various base classifiers. In this work we consider accuracy (generalization error) as the primary performance measure for concept drift learners. In this paper our research follows the three main drift types, starting from sudden drift, via gradual drift to reoccurring concepts. We give methodological contributions to concept drift phenomenon under real time application i.e. Electricity pricing contexts and expected change types. In this paper, a proposed methodology consist of four algorithms, first algorithm i.e. Optimal Window Resizing Algorithm under sudden drift to determine the optimal window length at a given time, identify to what extent a change point is different from the start of the training window and how this difference can be used to improve the accuracy of an adaptive learner. Second algorithm i.e. Gradual Drift algorithm which would unify two selection criteria: similarity in time and feature space to improve accuracy of an adaptive learner. Third algorithm i.e. Reoccurring Concept drift where previously seen patterns reoccur, but it is not certain when exactly and in what form they will repeat. Last algorithm i. e. Dynamic drift detection. In comparison to other methods, our proposed algorithms are faster and memory-less, a requirement for streaming applications. A proposed methodology is tested on Elec2 data, we get less error rate.
引用
收藏
页码:36 / +
页数:2
相关论文
共 21 条
[1]  
Aggarwal C.C., 2003, P 9 ACM SIGKDD INT C, P9
[2]  
[Anonymous], 2007, Uci machine learning repository
[3]  
Baena-Garcia M, 2006, 4 INT WORKSH KNOWL D, V6, P77
[4]   Efficient instance-based learning on data streams [J].
Beringer, Juergen ;
Huellermeier, Eyke .
INTELLIGENT DATA ANALYSIS, 2007, 11 (06) :627-650
[5]  
Bifet A, 2007, PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, P443
[6]  
Black M., 1999, Intelligent Data Analysis, V3, P453, DOI 10.1016/S1088-467X(99)00033-5
[7]  
Gama J, 2004, LECT NOTES ARTIF INT, V3171, P286
[8]  
Gama J, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P329
[9]  
Ganti Venkatesh., 2002, ACM SIGKDD EXPLORATI, V3, P1, DOI [10.1145/507515.507517, DOI 10.1145/507515.507517]
[10]  
Harries M, 1999, Technical report