A Framework for Exploiting Local Information to Enhance Density Estimation of Data Streams

被引:1
作者
Boedihardjo, Arnold P. [1 ]
Lu, Chang-Tien [2 ]
Wang, Bingsheng [2 ]
机构
[1] US Army Corps Engineers, Geospatial Res Lab, Engineer Res & Dev Ctr, Alexandria, VA 22315 USA
[2] Virginia Tech, Dept Comp Sci, Falls Church, VA 22043 USA
关键词
Local region information; General Local rEgion AlgorithM (GLEAM); BANDWIDTH SELECTION; ALGORITHMS; CHOICE;
D O I
10.1145/2629618
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Probability Density Function (PDF) is the fundamental data model for a variety of stream mining algorithms. Existing works apply the standard nonparametric Kernel Density Estimator (KDE) to approximate the PDF of data streams. As a result, the stream-based KDEs cannot accurately capture complex local density features. In this article, we propose the use of Local Region (LRs) to model local density information in univariate data streams. In-depth theoretical analyses are presented to justify the effectiveness of the LR-based KDE. Based on the analyses, we develop the General Local rEgion AlgorithM (GLEAM) to enhance the estimation quality of structurally complex univariate distributions for existing stream-based KDEs. A set of algorithmic optimizations is designed to improve the query throughput of GLEAM and to achieve its linear order computation. Additionally, a comprehensive suite of experiments was conducted to test the effectiveness and efficiency of GLEAM.
引用
收藏
页数:38
相关论文
共 48 条
[1]  
Aggarwal C. C., 2003, P 2003 ACM SIGMOD IN, P575, DOI DOI 10.1145/872757.872826
[2]  
Aggarwal C.C., 2007, Data streams: models and algorithms, P169, DOI DOI 10.1007/978-0-387-47534-9_9
[3]  
[Anonymous], P 2003 INT C VER LAR
[4]  
[Anonymous], P 9 INT WORKSH ART I
[5]  
[Anonymous], 2007, Uci machine learning repository
[6]  
[Anonymous], UCR TIME SERIES CLAS
[7]  
Babcock B., 2002, PODS, P1, DOI [DOI 10.1145/543613.543615, 10.1145/543613.543615]
[8]  
Boedihardjo A.P., 2008, P 17 ACM C INFORM KN, P619
[9]  
BOWMAN AW, 1984, BIOMETRIKA, V71, P353
[10]  
Chatfield C., 1990, INTRO MULTIVARIATE A