A Framework for Exploiting Local Information to Enhance Density Estimation of Data Streams

被引:1
作者
Boedihardjo, Arnold P. [1 ]
Lu, Chang-Tien [2 ]
Wang, Bingsheng [2 ]
机构
[1] US Army Corps Engineers, Geospatial Res Lab, Engineer Res & Dev Ctr, Alexandria, VA 22315 USA
[2] Virginia Tech, Dept Comp Sci, Falls Church, VA 22043 USA
关键词
Local region information; General Local rEgion AlgorithM (GLEAM); BANDWIDTH SELECTION; ALGORITHMS; CHOICE;
D O I
10.1145/2629618
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Probability Density Function (PDF) is the fundamental data model for a variety of stream mining algorithms. Existing works apply the standard nonparametric Kernel Density Estimator (KDE) to approximate the PDF of data streams. As a result, the stream-based KDEs cannot accurately capture complex local density features. In this article, we propose the use of Local Region (LRs) to model local density information in univariate data streams. In-depth theoretical analyses are presented to justify the effectiveness of the LR-based KDE. Based on the analyses, we develop the General Local rEgion AlgorithM (GLEAM) to enhance the estimation quality of structurally complex univariate distributions for existing stream-based KDEs. A set of algorithmic optimizations is designed to improve the query throughput of GLEAM and to achieve its linear order computation. Additionally, a comprehensive suite of experiments was conducted to test the effectiveness and efficiency of GLEAM.
引用
收藏
页数:38
相关论文
共 48 条
[21]  
Heidenreich N.-B., 2010, BANDWIDTH SELECTION, P1
[22]  
Heinz C, 2006, P 13 INT C MAN DAT, P91
[23]   Cluster Kernels: Resource-aware kernel density estimators over streaming data [J].
Heinz, Christoph ;
Seeger, Bernhard .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (07) :880-893
[24]  
Hjort NL, 1996, ANN STAT, V24, P1619
[25]  
Ioannidis Yannis E., 2003, VLDB MORGAN KAUFMANN, P19
[26]   A brief survey of bandwidth selection for density estimation [J].
Jones, MC ;
Marron, JS ;
Sheather, SJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1996, 91 (433) :401-407
[27]  
Knorr E. M., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P392
[28]  
Lehmann E. L., 2006, THEORY POINT ESTIMAT, DOI 10.1007/b98854
[29]  
Loader CR, 1996, ANN STAT, V24, P1602
[30]   Bandwidth selection: Classical or plug-in? [J].
Loader, CR .
ANNALS OF STATISTICS, 1999, 27 (02) :415-438