Locally adaptive dimensionality reduction for indexing large time series databases

被引:279
作者
Keogh, E [1 ]
Chakrabarti, K [1 ]
Mehrotra, S [1 ]
Pazzani, M [1 ]
机构
[1] Univ Calif Irvine, Dept Informat & Comp Sci, Irvine, CA 92697 USA
关键词
indexing; dimensionality reduction; content-based retrieval;
D O I
10.1145/376284.375680
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Similarity search in large time series databases has attracted much research interest recently. ft is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments of varying lengths such that their individual reconstruction errors are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a non-lower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searching, and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its superiority.
引用
收藏
页码:151 / 162
页数:12
相关论文
共 49 条
[1]  
Agrawal R., 1995, VLDB '95. Proceedings of the 21st International Conference on Very Large Data Bases, P490
[2]  
AGRAWAL R, 1995, P 21 INT C VER LARG
[3]  
Agrawal R., 1993, EFFICIENT SIMILARITY
[4]  
[Anonymous], 1995, SIGMOD
[5]  
[Anonymous], 1996, P 12 IEEE INT C DAT
[6]  
[Anonymous], 1999, KDD '99
[7]  
BAY SD, 2000, UCI KDD ARCH
[8]  
BENNETT KP, 1999, P 5 ACM SIGKDD INT C, P233
[9]  
CHAKRABARTI K, 2000, P IEEE INT C MULT EX
[10]  
Chakrabarti K., 2000, P VLDB C CAIR