An efficient similarity searching algorithm based on clustering for time series

被引:0
作者
Feng, Yucai [1 ]
Jiang, Tao [1 ]
Zhou, Yingbiao [1 ]
Li, Junkui [1 ]
机构
[1] Huazhong Univ Sci & Technol, Coll Comp Sci & Technol, Wuhan 430074, Peoples R China
来源
ADVANCES IN DATA MINING, PROCEEDINGS: MEDICAL APPLICATIONS, E-COMMERCE, MARKETING, AND THEORETICAL ASPECTS | 2008年 / 5077卷
关键词
time series; clustering; similarity search; indexing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Indexing large time series databases is crucial for efficient searching of time series queries. In the paper, we propose a novel indexing scheme RQI (Range Query based on Index) which includes three filtering methods: first-k filtering, indexing lower bounding and upper bounding as well as triangle inequality pruning. The basic idea is calculating wavelet coefficient whose first k coefficients are used to form a MBR. (minimal bounding rectangle) based on haar wavelet transform for each time series and then using point filtering method; At the same time, lower bounding and upper bounding feature of each time series is calculated, in advance, and stored into index structure. At last, triangle inequality pruning method is used by calculating the distance between time series beforehand. Then we introduce a novel lower bounding distance function SLBS (Symmetrical Lower Bounding based on Segment) and a novel clustering algorithm CSA (Clustering based on Segment Approximation) in order to further improve the search efficiency of point filtering method by keeping a good clustering trait of index structure. Extensive experiments over both synthetic and real datasets show that, our technologies provide perfect pruning power and could obtain an order of magnitude performance improvement for time series queries over traditional naive evaluation techniques.
引用
收藏
页码:360 / 373
页数:14
相关论文
共 12 条
[1]  
Agrawal R., 1993, Foundations of Data Organization and Algorithms. 4th International Conference. FODO '93 Proceedings, P69
[2]  
[Anonymous], 2001, P ACM SIGMOD C MAN D
[3]  
[Anonymous], P ACM SIG MOD INT C
[4]  
Keogh E., 2002, Proceedings of the Twenty-eighth International Conference on Very Large Data Bases, P406
[5]   An index-based approach for similarity search supporting time warping in large sequence databases [J].
Kim, SW ;
Park, S ;
Chu, WW .
17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, :607-614
[6]  
KORN F, 1997, P ACM SIGMOD INT C M, P289, DOI DOI 10.1145/253260.253332
[7]  
LI JK, 2006, 9 INT C INF TECHN, P203
[8]  
Liu B, 2006, LECT NOTES COMPUT SC, V4016, P460
[9]   Duality-based subsequence matching in time-series databases [J].
Moon, YS ;
Whang, KY ;
Loh, WK .
17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, :263-272
[10]   Discovering similar multidimensional trajectories [J].
Vlachos, M ;
Kollios, G ;
Gunopulos, D .
18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, :673-684