A grid density based framework for classifying streaming data in the presence of concept drift

被引:24
作者
Sethi, Tegjyot Singh [1 ]
Kantardzic, Mehmed [2 ]
Hu, Hanquing [1 ]
机构
[1] Univ Louisville, Data Min Lab, JB Speed Sch Engn, Louisville, KY 40292 USA
[2] Univ Louisville, Dept Comp Sci & Comp Engn, Louisville, KY 40292 USA
关键词
Streaming data; Ensemble; Classification; Grid density clustering; Limited labeling; Concept drift; EVOLVING DATA;
D O I
10.1007/s10844-015-0358-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mining data streams is the process of extracting information from non-stopping, rapidly flowing data records to provide knowledge that is reliable and timely. Streaming data algorithms need to be one pass and operate under strict limitations of memory and response time. In addition, the classification of streaming data requires learning in an environment where the data characteristics might change constantly. Many of the classification algorithms presented in literature assume a 100 % labeling rate, which is impractical and expensive when data records are rapidly flowing in. In this paper, a new incremental grid density based learning framework, the GC3 framework, is proposed to perform classification of streaming data with concept drift and limited labeling. The proposed framework uses grid density clustering to detect changes in the input data space. It maintains an evolving ensemble of classifiers to learn and adapt to the model changes over time. The framework also uses a uniform grid density sampling mechanism to obtain a uniform subset of samples for better classification performance with a lower labeling rate. The entire framework is designed to be one-pass, incremental and work with limited memory to perform any-time classification on demand. Experimental comparison with state of the art concept drift handling systems demonstrate the GC3 frameworks ability to provide high classification performance, using fewer models in the ensemble and with only 4-6 % of the samples labeled. The results show that the GC3 framework is effective and attractive for use in real world data stream classification applications.
引用
收藏
页码:179 / 211
页数:33
相关论文
共 49 条
  • [1] [Anonymous], 2004, COMPUTER SCI DEP TRI
  • [2] Bache K., 2013, UCI Machine Learning Repository
  • [3] Classifying evolving data streams with partially labeled data
    Borchani, Hanen
    Larranaga, Pedro
    Bielza, Concha
    [J]. INTELLIGENT DATA ANALYSIS, 2011, 15 (05) : 655 - 670
  • [4] Density-Based Clustering over an Evolving Data Stream with Noise
    Cao, Feng
    Ester, Martin
    Qian, Weining
    Zhou, Aoying
    [J]. PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 328 - +
  • [5] Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach
    Chen, Sheng
    He, Haibo
    [J]. EVOLVING SYSTEMS, 2011, 2 (01) : 35 - 50
  • [6] Chen YX, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P133
  • [7] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [8] An adaptive ensemble classifier for mining concept drifting data streams
    Farid, Dewan Md.
    Zhang, Li
    Hossain, Alamgir
    Rahman, Chowdhury Mofizur
    Strachan, Rebecca
    Sexton, Graham
    Dahal, Keshav
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (15) : 5895 - 5906
  • [9] Gama J, 2004, LECT NOTES ARTIF INT, V3171, P286
  • [10] Gama J., 2009, Proceedings of the 2009 ACM symposium on Applied Computing, Hawai, P1496