Scalable data series subsequence matching with ULISSE

被引:0
作者
Michele Linardi
Themis Palpanas
机构
[1] LIPADE,
[2] Université de Paris,undefined
来源
The VLDB Journal | 2020年 / 29卷
关键词
Data series; Similarity search; Variable length; Lower bounding; Subsequence matching; Euclidean distance; Dynamic time warping;
D O I
暂无
中图分类号
学科分类号
摘要
Data series similarity search is an important operation, and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast similarity search, all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this work, we propose ULISSE, the first data series index structure designed for answering similarity search queries of variable length (within some range). Our contribution is twofold. First, we introduce a novel representation technique, which effectively and succinctly summarizes multiple sequences of different length. Based on the proposed index, we describe efficient algorithms for approximate and exact similarity search, combining disk-based index visits and in-memory sequential scans. Our approach supports non-Z-normalized and Z-normalized sequences and can be used with no changes with both Euclidean distance and dynamic time warping, for answering both k-NN and ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon $$\end{document}-range queries. We experimentally evaluate our approach using several synthetic and real datasets. The results show that ULISSE is several times, and up to orders of magnitude more efficient in terms of both space and time cost, when compared to competing approaches.
引用
收藏
页码:1449 / 1474
页数:25
相关论文
共 112 条
[1]  
Raza U(2015)Practical data prediction for real-world wireless sensor networks IEEE Trans. Knowl. Data Eng. 27 2231-2244
[2]  
Camerra A(1999)Tuning time series queries in finance: Case studies and recommendations IEEE Data Eng. Bull. 22 40-46
[3]  
Murphy AL(2014)Computational intelligence challenges and applications on large-scale astronomical time series databases IEEE Comput. Intell. Mag. 9 27-39
[4]  
Palpanas T(2015)Data series management: the road to big sequence analytics SIGMOD Rec. 44 47-52
[5]  
Picco GP(2019)Report on the first and second interdisciplinary time series analysis workshop (ITISA) SIGMOD Rec. 48 36-40
[6]  
Shasha D(2019)Data series management Dagstuhl Reports 9 47-52
[7]  
Huijse P(2015)Time series classification with ensembles of elastic distance measures DAMI 29 565-592
[8]  
Estévez PA(2018)The lernaean hydra of data series similarity search: an experimental evaluation of the state of the art PVLDB 12 112-127
[9]  
Protopapas P(2019)Return of the lernaean hydra: experimental evaluation of data series approximate similarity search PVLDB 13 403-420
[10]  
Principe JC(2008)A compact multi-resolution index for variable length queries in time series databases KAIS 15 131-147