Efficient moving average transform-based subsequence matching algorithms in time-series databases

被引:27
作者
Moon, Yang-Sae
Kim, Jinho
机构
[1] Kangwon Natl Univ, Dept Comp Sci, Chunchon 200701, Kangwon, South Korea
[2] Korea Adv Inst Sci & Technol, AITrc, Taejon 305701, South Korea
关键词
data mining; time-series databases; subsequence matching; moving average transform;
D O I
10.1016/j.ins.2007.05.038
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Moving average transform is very useful in finding the trend of time-series data by reducing the effect of noise, and has been used in many areas such as econometrics. Previous subsequence matching methods with moving average transform, however, are problematic in that, since they must build multiple indexes in supporting transform of arbitrary order, they incur index overhead both in storage space and in update maintenance. To solve this problem, we propose a single-index approach to subsequence matching that supports moving average transform of arbitrary order in time-series databases. Using the single-index approach, we can reduce both the storage space and the index maintenance overhead. In explaining the single-index approach, we first introduce the notion of poly-order moving average transform by generalizing the original definition of moving average transform. We then formally prove the correctness of poly-order transform-based subsequence matching. We also propose two subsequence matching methods based on poly-order transform that efficiently support moving average transform of arbitrary order. Experimental results for real stock data show that, compared with the sequential scan, our methods improve average performance significantly, by a factor of 22.6-33.6. Also, compared with cases in which an index is built for every moving average order, our methods reduce storage space and maintenance effort significantly while incurring only marginal performance degradation. Our approach entails the additional advantage of being generalized to support many other transforms in addition to moving average transform. Therefore, we believe that our approach will be widely used in many transform-based subsequence matching methods. (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:5415 / 5431
页数:17
相关论文
共 26 条
[1]  
Agrawal R., 1995, VLDB '95. Proceedings of the 21st International Conference on Very Large Data Bases, P490
[2]  
Agrawal R., 1993, P 4 INT C FDN DAT OR, V730, P69
[3]   DDR: an index method for large time-series datasets [J].
An, JY ;
Chen, YPP ;
Chen, HX .
INFORMATION SYSTEMS, 2005, 30 (05) :333-348
[4]  
BECKMANN N, 1990, SIGMOD REC, V19, P322, DOI 10.1145/93605.98741
[5]   Haar wavelets for efficient similarity search of time-series: With and without time warping [J].
Chan, FKP ;
Fu, AWC ;
Yu, C .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (03) :686-705
[6]  
Chatfield C., 1984, ANAL TIME SERIES INT
[7]  
Chu K. K. W., 1999, Proceedings of the Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, P237, DOI 10.1145/303976.304000
[8]   An indexing scheme for energy-efficient processing of content-based retrieval queries on a wireless data stream [J].
Chung, Yon Dohn .
INFORMATION SCIENCES, 2007, 177 (02) :525-542
[9]  
GAO L, 2002, P 2002 ACM SIGMOD IN, P370
[10]  
Kendall M., 1976, TIME SERIES