Similarity search in data stream with adaptive segmental approximations

被引:1
|
作者
Wu, Feng [1 ]
Zhong, Yan [1 ]
Wu, Quan-Yuan [1 ]
Jia, Yan [1 ]
Yang, Shu-Qiang [1 ]
机构
[1] School of Computers, National University of Defense Technology
来源
Ruan Jian Xue Bao/Journal of Software | 2009年 / 20卷 / 10期
关键词
Data stream; Dynamic time warping; Similarity search; Time series analysis;
D O I
10.3724/SP.J.1001.2009.03548
中图分类号
学科分类号
摘要
Similarity search has attracted many researchers from various communities (real-time stock quotes, network security, sensor networks). Due to the infinite, continuous, fast and real-time properties of the data from these communities, a method is needed for online similarity search in data stream. This paper first proposes the lower bound function LB_seg_WFglobal for DTW (dynamic time warping) in the presence of global warping constraints and LB_seg_WF for DTW without global warping constraints, which are not applied to any index structures. They are segmented DTW techniques, and can be applied to sequences and queries of varying lengths in data stream. Next, several tighter lower bounds are proposed to improve the approximate degree of the LB_seg_WFglobal and LB_seg_WF. Finally, to deal with the possible continuously non-effective problem of LB_seg_WFglobal or LB_seg_WF in data stream, it is believed that lower-bound LB_WFglobal (in the presence of global warping constraints) and lower-bound LB_WF, upper-bound UB_WF (without global warping constraints) can fast estimate DTW and hence reduce a lot of redundant computations by incrementally computing. The theoretical analysis and statistical experiments confirm the validity of the proposed methods. © by Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2867 / 2884
页数:17
相关论文
共 12 条
  • [1] Faloutsos C., Ranganathan M., Manolopoulos Y., Fast subsequence matching in time-series databases, Proc. of the 1994 ACM SIGMOD Int'l Conf. on Management of Data, pp. 419-429, (1994)
  • [2] Pan D., Shen J.Y., Similarity discovery techniques in temporal data mining, Journal of Software, 18, 2, pp. 246-258, (2007)
  • [3] Zhou M., Wong M.H., Efficient online subsequence searching in data streams under dynamic time warping distance, Proc. of the 24th Int'l Conf. on Data Engineering (ICDE 2008), pp. 686-694, (2008)
  • [4] Shou Y., Mamoulis N., Cheung D.W., Fast and exact warping of time series using adaptive segmental approximations, Journal of Machine Learning, 58, 2, pp. 231-267, (2005)
  • [5] Yi B., Jagadish H., Faloutsos C., Efficient retrieval of similar time sequences under time warping, Proc. of the 14th Int'l Conf. on Data Engineering (ICDE'98), pp. 201-208, (1998)
  • [6] Keogh E., Exact indexing of dynamic time warping, Proc. of the 28th ACM VLDB Int'l Conf. on Very Large Data Bases, pp. 406-417, (2002)
  • [7] Keogh E.J., Chakrabarti K., Mehrotra S., Pazzani M.J., Locally adaptive dimensionality reduction for indexing large time series databases, Proc. of the 2001 SIGMOD Int'l Conf. on Management of Data, pp. 151-162, (2001)
  • [8] Vlachos M., Hadjieleftheriou M., Gunopulos D., Keogh E.J., Indexing multi-dimensional time-series with support for multiple distance measures, Proc. of the 9th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pp. 216-225, (2003)
  • [9] Zhu Y., Shasha D., Warping indexes with envelope transforms for query by humming, Proc. of the 2003 ACM SIGMOD Int'l Conf. on Management of Data, pp. 181-192, (2003)
  • [10] Han W.S., Lee J., Ranked subsequence matching in time-series databases, Proc. of the 33rd ACM VLDB Int'l Conf. on Very Large Data Bases, pp. 423-432, (2007)